kerchunk
kerchunk copied to clipboard
Nested HDF5 Data / HEC-RAS
I'm working on development of the rashdf library for reading HEC-RAS HDF5 data. A big part of the motivation for development of the library is stochastic hydrologic/hydraulic modeling.
We want to be able to generate Zarr metadata for stochastic HEC-RAS outputs, so that e.g., results for many different stochastic flood simulations from a given RAS model can be opened as a single xarray Dataset. For example, results for 100 different simulations could be concatenated in a new simulation
dimension, with coordinates being the index number of each simulation. It took me a little while to figure out how to make that happen because RAS HDF5 data is highly nested and doesn't conform to typical conventions.
The way I implemented it is hacky:
- Given an
xr.Dataset
pulled from the HDF file and the path of each childxr.DataArray
within the HDF file, - Get the filters for each DataArray:
filters = SingleHdf5ToZarr._decode_filters(None, hdf_ds)
- Get the storage info for each DataArray:
storage_info = SingleHdf5ToZarr._storage_info(None, hdf_ds)
- Build out metadata for chunks using
storage_info
- "Write" the
xr.Dataset
to azarr.MemoryStore
withcompute=False
, to generate the framework of what's needed for the Zarr metadata - Read the objects generated by writing to
zarr.MemoryStore
and decode - Assemble the
zarr.MemoryStore
objects,filters
, andstorage_info
into a dictionary and finally return
I suppose my questions are:
- Is there a better way to approach highly nested or otherwise idiosyncratic HDF5 data with Kerchunk?
- Could Kerchunk's
SingleHdf5ToZarr._decode_filters
and_storage_info
methods be made public?