What is HDF5?¶
All NISAR standard products are in Hierarchical Data Format version 5 (HDF5). HDF5 is a programming library and file format designed to store, organize, and access large scientific datasets. NISAR uses HDF5 to systematically organize radar data and metadata in a way that is both efficient and easy to read, share, and analyze.
HDF was originally developed by the University of Illinois’ National Center for Supercomputing Applications (NCSA) to support data sharing within the scientific community. HDF5 represents a significant redesign compared to earlier versions of HDF, with a more flexible and powerful internal structure. For additional details, users can consult the official HDF documentation at
https://
At a high level, an HDF5 file functions as a container that organizes data into a hierarchy of objects, such as groups, datasets, and datatypes. In general, radar layers are organized into two groups: frequencyA/ and (potentially) frequencyB/. Note that nothing is stored at the root /.
Groups¶
An HDF5 group is a folder within an HDF5 file. Groups can hold datasets, datatypes, and other groups (subfolders). In essence, groups act like directories on computers. In a NISAR product, datasets are organized through nesting. For example, in a NISAR GCOV product, a dataset may be stored at a path such as:
/science/LSAR/GCOV/grids/frequencyA/HHIn this path, science, LSAR, GCOV, grids, and frequencyA are groups, and HH is a dataset contained within the frequencyA group.
Datasets¶
An HDF5 dataset is where the actual data lives. This might be an array or a table stored within the HDF5 file. Each dataset will include the data, a dataspace, a datatype, and additional (optional) attributes such as units, range, time, and other descriptions.
Attributes¶
An HDF5 attribute is a small piece of information that describes a group or dataset. Note that an attribute does not store the data itself. Attributes provide important context that help correctly interpret values within a dataset. Common examples include:
Units of measurement
Descriptions of what the data represent
Valid ranges, dates of acquisition
Processing details
Storing this information with the data helps ensure that datasets can be understood and used correctly without relying on external documentation.
Datatypes¶
An HDF5 datatype describes the kind of data that is being stored. A datatype explains both how to interpret a dataset and how it is stored. Datatypes fall into three categories: atomic datatypes, composite datatypes, and named datatypes.
A summary of some important datatypes is given below. For more details on HDF5 datatypes and their uses, see the official HDF5 Datatypes documentation.
Atomic Datatypes¶
Atomic datatypes are typically the simplest datatypes. They serve as building blocks for more complex datatypes. Common atomic datatypes include:
Time
Bitfield
String
Reference
Opaque
Integer
Float
Derived datatypes are customized atomic datatypes, commonly used for N-bit integers, floating-point formats, and other nonstandard data representations. They enable efficient and precise storage when data do not conform to standard numeric formats. Derived datatypes are useful because they:
Allow custom storage with specific bit lengths
Support values that do not follow standard integer or floating-point formats
Preserve the original format in which the data were recorded
Composite Datatypes¶
Composite datatypes are combinations of other datatypes. Some important composite datatypes are described below.
Array datatypes represent fixed-size, multi-dimensional arrays of a specified base datatype, where the array shape is defined as part of the datatype.
NISAR example: an array datatype used within a metadata or structured record to store a fixed-size collection of values (e.g., a constant-dimension vector or matrix associated with an acquisition or product record).
Variable-length datatypes represent one-dimensional arrays of a specified base datatype, with a variable number of items.
NISAR example: a variable-length array used to store lists of contributing looks, burst indices, or quality flags, where the number of entries may vary between pixels.
Compound datatypes represent collections of named fields, each with its own datatype.
NISAR example: storing several related per-pixel values together (e.g., coherence, incidence angle, and a validity flag) as one record instead of separate datasets.
Enumeration datatypes map integer values to a predefined set of named labels, improving user readability and consistency.
NISAR example: using named labels such as
nominal,low_quality, orinvalidto represent processing or quality states instead of raw numeric codes.
Named Datatypes¶
Named datatypes are stored as objects within an HDF5 file. Any datatype (atomic, derived, or composite) can be named or referenced throughout the file. Naming allows datatypes to be:
Shared across datasets and attributes
Reused simply and consistently