Abstract
This paper describes a web-based
system called DIAL consisting of a package of
tools, specifications, and documentation that
will allow a data provider to create a data
server for cataloging, documenting, and
distributing scientific data and provide data
display and analysis tools. The DIAL software is
freely available for users. The system was
developed as a part of National
Aeronautics and Space
Administrations EOSDIS
technology prototype efforts and in
collaboration with National
Center for Supercomputing
Applications. The system was
primarily developed for the Earth science
community, but could be useful for other science
and education communities. Since DIAL is a
modular, extensible system, tools developed by
others can easily be integrated with this
system.
Introduction and Background
WWW technologies have greatly increased the
on-line accessibility of science data. Thousands
of Web pages have been created worldwide by
government agencies, universities, and private
industry. However, a single package of tools to
create a scientific data server and an inventory
of data holdings as well as display and analyze
data is not available.
The
Earth Observing System (EOS) is a very
large, ambitious project funded by NASA as part
of the Mission
to Planet Earth. The
EOS Data and Information System (EOSDIS) is
the portion of the project charged with handling
the vast amounts of data gathered by EOS.
Possibly the most visible functions of EOSDIS is
the archiving and distribution of the enormous
amounts of data. The EOS-AM satellite, planned
to be launched in the middle part of 1999, will
generate nearly one terabyte of data per day
from its five scientific instruments on board.
The current plan calls for the functions of
processing and distribution of the EOS-AM data
to be carried out by eight Distributed Active
Archive Centers (DAACs). Each of these DAACs
will contain a large, complex data system
designed to handle a large volume of data search
and ordering transactions.
EOSDIS will produce and distribute over 200
data products. Historically, science data are
archived in many different native and standard
formats (e.g., HDF, CEOS Superstructure, FITS,
netCDF, CDF, BUFR, GRIB, etc.). The WWW
community has benefited by adopting just a few
standards (e.g., HTML, HTTP, and GIF).
Similarly, adoption of a few standard formats
will greatly facilitate the access to and
exchange of science data in EOS. Therefore,
NASAs EOSDIS project adopted HDF-EOS as
the standard format for the production and
distribution of science data from the EOS
project. HDF-EOS, an extension of NCSA's
Hierarchical
Data Format (HDF), has added three more data
models, namely Swath, Grid, and Point, into HDF.
HDF-EOS software library is available on UNIX
and Windows platforms. More information about
HDF-EOS and sample HDF-EOS data sets can be
found at http://hdf-eos.gsfc.nasa.gov.
Recent recommendations by the National
Research Council (NRC) suggested that an EOSDIS
built from many small interoperable data systems
working loosely together, rather than a few
tightly coupled large data systems, may be
desirable. NASA is experimenting with NRC's
recommendations by implementing the Earth
Science Information Partnership (ESIP) program
to build a federation of Earth science
information providers.
All above implicit and explicit requirements
call for a simple, compact, scalable, flexible,
interoperable, and standards-complied data
system for distributing Earth science data
through the Web. In response to such
requirements, we have developed a system called
Data and Information Access Link (DIAL). This
paper describes the architecture and
functionality of DIAL.
Objectives and Functionality
The main objective of DIAL is to provide an
integrated package of software tools to data
providers for distributing data through the Web.
With DIAL, a data provider can
set up a low end workstation
(Windows 95/NT or UNIX) as a Web server,
populate it with data and metadata,
establish Web pages to provide search and
selection of data,
provide client tools to be used with Web
browsing software to further examine or
manipulate the data.
By connecting to a DIAL site through their
web browsers, data users can do the following
manipulations:
spatial, temporal and parameter
based search
view data and metadata
browsing, subsetting and subsampling of
data
on-line downloading of data in multiple
formats
x-y plotting of tabular data
DIAL is flexible, scalable, and
interoperable. Data providers ranging from
individual researchers to large data centers can
use it. Once a DIAL site is set up, any user
with a Web browser can access data and need not
know the technical details of the server-side
software.
DIAL has potential applications in where a
federation of data systems needs to
inter-operate. The DIAL is a compact suite of
software, specifications, and documentation
developed and assembled primarily from
off-the-shelf public domain software and easily
customizable by the site administrator. DIAL is
Web-based client-server system that takes many
advantages of the Web technologies.
DIAL supports both HDF and HDF-EOS formats.
An earlier version of DIAL called Scientific
Data Browser (SDB) developed by NCSA works with
FITS and netCDF data. DIAL can also easily be
extended to work with other data formats.
Applications of DIAL
The current version of DIAL is 2.0, which
provides a very low cost Web-based client/server
package to individuals and groups desiring to
provide access to collections of science
data.
DIAL can be used in the following
scenarios:
- As a data distribution system
- Data search and access system
- Extended catalog system
Although the current DIAL implementation is
tailored to work with Earth science data, the
architecture of DIAL will make it easy for
extending DIAL to be used for other science
data. The users of the DIAL might include
principal investigators of scientific research,
field-campaign data collectors, developers of
special or experimental data products, or K-12
and university Earth science educators. With
continuing development and implementation, the
eventual potential of such a system is nearly as
large as that of the Web, itself.
Availability of DIAL
Currently, DIAL supports the following
platforms: DEC Alpha, SGI, Sun Solaris, HP,
Windows 95 and NT. The DIAL package is less than
5 MB and is freely available from the web site
at http://laits.gmu.edu/DownloadInterface.html.
The only requirement for DIAL is that data to be
distributed have to be in HDF or HDF-EOS.
The Architecture
The Web client-server based DIAL architecture
is shown in Figure 1. DIAL consists of a number
of client helper application tools as well as
server utilities and CGI executables. The client
side browsers have the capability of direct
accessing and manipulating files on a DIAL
server. Some components of the architecture (on
both the client and server sides) are available
as off-the-shelf public domain tools, while
others will be developed specifically for the
DIAL.
One of our design goals was to have the
more generic functions implemented on the server
and the more application specific functions on
the client side as helper applications. The
server, then, is responsible for helping the
user to reduce his/her network bandwidth
requirements by providing easy identification of
the desired data file and a first approximation
of the exact data within that file. The helper
applications can then aid the user in performing
discipline specific analysis on the portion of
the file actually downloaded to the local
system. This philosophy will preserve the
generality of the server, while reducing network
bandwidth requirements.
Server Side Components
Currently, two server-side components, namely
DIAL Server System and Data Management, are
available. The following sections will discuss
in some details about those components.
Data Management
The functions of data management include the
ingestion of data into DIAL, the creation of
inventory for data search, and the
administration of data and the software.
Data Ingestion and Inventory
creation
If the data to be distributed are already in
HDF or HDF-EOS, no data conversion is needed. If
the data are not in the above mentioned DIAL
supported formats, the conversion of data into
DIAL supported formats is needed. The main
function of the ingestion component is to
prepare data files to be added into the server.
Such preparation includes translation of
"foreign" data formats into HDF or HDF-EOS and
the addition of a standardized metadata block
(in PVL or ODL) into the files. Any tool that
can output an HDF file can work in this capacity
in conjunction with a simple metadata
extraction/encoding tool. Currently, DIAL
provides data translators to convert ARC/INFO
exchange format into HDF-EOS. The GeoTIFF and
Shape translators will be available in 1999.
More data translators and some generic data
description tools are planned to be developed.
In addition, data providers can develop their
data translators to translate their specific
data into HDF or HDF-EOS. Both HDF and HDF
libraries are freely available for building the
customized data translators. To add the standard
metadata block into the HDF or HDF-EOS files,
DIAL provides a tool called "meta". These
metadata will be used by DIAL to automatically
create the searchable inventory tables. Data
providers can also use DIAL without creating
metadata to view HDF and HDF-EOS files. In
addition, metadata stored as global attributes
within an HDF file can be viewed through DIAL.
We are planning to update the metadata tools in
the next release.
The inventory tables are created in DIAL by
using a tool called "crinv", which builds the
inventory table based on the metadata in the
individual data files. A data provider can
create their customized inventory tables for
their data files by using the configuration
file.
DIAL provides two options to store inventory
tables:
For data providers with no access to
commercial database packages, DIAL provides a
simple flat-file database (as an Vdata object
in the inventory HDF file) for storing the
inventory tables.
For data providers with access to JDBC
capable databases, "crinv" will store the
inventory table in their databases through
the JDBC connection.
Administration Tools
Administration tools consist of inventory
maintenance and data ingest support tools. These
tools will aid the site administrator in
populating the server and tailoring the
interface to the specific needs of the target
user community. The users' view of the server
contents can be customized through the choice of
inventory fields presented and the creation of
indexes to support specific access paths. An
advertisement function can also be added to the
administration tools. For instance, in EOSDIS,
we have plans to extract information from the
inventory and include it as a part of a
higher-level directory system located elsewhere
on the Web. Utilities available or planned to be
developed are:
sorting utilities
various data maintenance utilities
data availability advertisement tools
software maintenance utilities
These tools will help data providers keep track
of their data holdings and facilitate user
access. These tools can help to create the
inventory, add, delete, and edit records from
the inventory.
The Server System
The DIAL server system consists of two major
components: dib_search and dib_view. Both of
them are working as CGI programs. The dib_serach
works interactively with users to find the data
the users want through searching the inventory
tables against users' search criteria. Currently
DIAL provides the combination of spatial,
temporal, and parameter-based search. dib_view
can talk to either ODBC capable database through
its JDBC interface or the flat-file inventory
table. Both HTML and Java-based user interface
are provided.
Once the required data are found, users can
exploit the data through interaction with
dib_view. dib_view provides following data
manipulation and access functions:
geographic, temporal, parameter,
array coordinate, and record -based
subsetting and subsampling, and downloading.
On-the-fly browse image generation and
display.
multi-variant X-Y plot.
On-the-fly reformatting for users to
download data in HDF, HDF-EOS, plane binary,
ASCII, HTML, and GIF formats.
Metadata display.
Currently in server side, dib_view only works
with data in HDF or HDF-EOS formats. However, we
plan to add more DIAL supported formats in the
near future.
Client Side Components
The client side tools will be able to access
an HTTP server containing HDF and HDF-EOS files.
The client will consist of Java GUI, web browser
and a suite of helper applications to enhance
and extend the capabilities of the server. The
Java GUI is composed of Java applets displaying
spatial, temporal and attribute search panels.
The spatial panel component also includes a two
dimensional world map.
Helper applications to display data, to dump
and extract the data, and to analyze the data
can all be linked to this system via MIME types
and proper browser configuration. Many such
applications like Java HDF Viewer (JHV) and hdp
(an HDF dumper utility) are already freely
available on the Web. Commercial tools and
programs using IDL and other image processing
packages can also be linked to the system.
Browsers: Netscape and MS Internet
Explorer
Helper Applications:
Java based HDF Viewer JHV
Link Winds
EOSView can display HDF files
hdp (HDF dumper) provides limited
subsetting capabilities
Many other useful helper applications can
be added to this list supporting UNIX and
PC
Configuration documentation for several
browsers (Netscape Navigator and MS Internet
Explorer)
Software tools to subset data
Conclusion
Although the focus of the current
architecture is to provide access to HDF and
HDF-EOS files, the server concept can be
extended to provide access to data in other
formats. This system can eventually provide
access to data in many formats.
Some DIAL test sites:
EOSDIS test site
ACE
NOAA PEML
JPL DAAC
NASDA
DERA
Related Publications
MTPE EOS Reference Handbook, 1995. EOS
Project Science Office, NASA Goddard Space
Flight Center.
Encyclopedia of Graphics File Formats, 1994.
J. D. Murray and W. Van Ryper
Data Transport Within the Distributed
Oceanographic Data System, 1995. James Gallager
and G. Milkowski, WWW Journal, Fourth
International WWW Conference proceedings, Dec
11-14, 1995
HDF Users Guide, Version 4.0, 1996. National
Center for Supercomputer Applications,
University of Illinois at Urbana-Champaign
Acknolowedgement
The authors are thankful for the NASA
technology prototype grant for financial
assistance. Several people have participated in
the development of the software and design. We
would like to thank Ted Meyer, Mike Folk, Nancy
Yeager, G.Ponnamparampillai, Jon Pals,
Radhakrishna Garge, and Khoa Doan.