Readers should raise an error for HDF files using the compact storage layout.
While investigating possible HDF storage scenarios for scalar values from https://github.com/zarr-developers/VirtualiZarr/pull/523 I discovered that HDF also supports a "compact" storage layout where extremely small datasets or values (<64KB) are inlined into the file header https://support.hdfgroup.org/documentation/hdf5/latest/_l_b_dset_layout.html. The HDF5 lib has no support for inferring the offset and size of datasets stored using the compact layout so we have no way of creating a ChunkManifest for them and should raise an unsupported exception.
- [ ] Create a test fixture with a scalar stored in the compact storage layout using the low-level
h5py.h5dAPI. - [ ] Update
HDFVirtualBackendto check the dataset's storage layout.
Hmm, this is potentially an issue with the whole "readers as creators of ManifestStores" idea. We can put this inlined data into a virtual dataset and into Icechunk, it just can't be a virtual variable. (Or at least the HDF library won't help us if we want to make that virtual variable.)
If reader implementations had the ability to say "nah actually you're getting this variable in memory" then we could deal with this situation gracefully.
A compromise might be to have the error message suggest explicitly loading that particular variable.
@TomNicholas I think including the suggestion to load the problematic variables is probably the way to go 👍. I'm hopeful that this is will be a fairly infrequent case.
BTW a similar thing would happen if you try to load kerchunk references that are inlined into the kerchunk reference file. But in that case it's easier to generate a reference to the data (which lives in the kerchunk json file).