earth2studio icon indicating copy to clipboard operation
earth2studio copied to clipboard

🐛[BUG]: h5py fault

Open NickGeneva opened this issue 2 months ago • 2 comments

Version

main

On which installation method(s) does this occur?

source

Describe the issue

The most recent h5py update (3.15.0) seems to break some of our xarray loads of netcdf files in the CI environment.

Its unknown what the issue is, but for now its recommended to downgrade to 3.14.0 of h5py... Switching the xarray engine from h5netcdf -> netcdf4 is a work around for NC files.

I have not locked the h5py range since this seems to only surface in a few tests, and the new uv.lock file should select 3.14.0 h5py. If you encounter a similar issue, downgrade h5py

Test failure looks like:

RuntimeError: Unspecified error in H5DSget_num_scales (return value <0)
Task exception was never retrieved
future: <Task finished name='Task-1678' coro=<tqdm_asyncio.gather.<locals>.wrap_awaitable() done, defined at /usr/local/lib/python3.12/dist-packages/tqdm/asyncio.py:75> exception=RuntimeError('Unspecified error in H5DSget_num_scales (return value <0)')>
Traceback (most recent call last):
  File "/usr/lib/python3.12/asyncio/tasks.py", line 316, in __step_run_and_handle_result
    result = coro.throw(exc)
             ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tqdm/asyncio.py", line 76, in wrap_awaitable
    return i, await f
              ^^^^^^^
  File "/builds/modulus/earth-2/earth2studio/src/earth2studio/data/ncar.py", line 274, in fetch_wrapper
    out = await self.fetch_array(
          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/builds/modulus/earth-2/earth2studio/src/earth2studio/data/ncar.py", line 336, in fetch_array
    ds = await asyncio.to_thread(
         ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/futures.py", line 287, in __await__
    yield self  # This tells Task to wait for completion.
    ^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/tasks.py", line 385, in __wakeup
    future.result()
  File "/usr/lib/python3.12/asyncio/futures.py", line 203, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/xarray/backends/api.py", line 596, in open_dataset
    backend_ds = backend.open_dataset(
                 ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/xarray/backends/h5netcdf_.py", line 502, in open_dataset
    store = H5NetCDFStore.open(
            ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/xarray/backends/h5netcdf_.py", line 225, in open
    manager = manager_cls(h5netcdf.File, filename, mode=mode, kwargs=kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/xarray/backends/file_manager.py", line 370, in __init__
    self._file: T_File | None = opener(*args, **kwargs)
                                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/h5netcdf/core.py", line 1607, in __init__
    super().__init__(self, self._h5path)
  File "/usr/local/lib/python3.12/dist-packages/h5netcdf/core.py", line 911, in __init__
    if _unlabeled_dimension_mix(v) == "unlabeled":
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/h5netcdf/core.py", line 680, in _unlabeled_dimension_mix
    dimset = {len(j) for j in dimlist}
              ^^^^^^
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/usr/local/lib/python3.12/dist-packages/h5py/_hl/dims.py", line 60, in __len__
    return h5ds.get_num_scales(self._id, self._dimension)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5ds.pyx", line 71, in h5py.h5ds.get_num_scales
  File "h5py/defs.pyx", line 4282, in h5py.defs.H5DSget_num_scales
RuntimeError: Unspecified error in H5DSget_num_scales (return value <0)

MRE:


import numpy as np
import xarray as xr


def main():
    
    np.random.seed(42)
    data = np.random.randn(10, 20, 30)
    ds = xr.Dataset(
        {
            "data": (["time", "lat", "lon"], data),
        },
        coords={
            "time": np.arange(10),
            "lat": np.linspace(-90, 90, 20),
            "lon": np.linspace(-180, 180, 30),
        },
    )
    
    filename = "test.nc"
    ds.to_netcdf(filename, engine="netcdf4")
    ds_loaded = xr.open_dataset(filename, engine="h5netcdf")

    print("\nLoaded dataset:")
    print(ds_loaded)

if __name__ == "__main__":
    main()

NickGeneva avatar Oct 14 '25 22:10 NickGeneva

xref: https://github.com/h5py/h5py/issues/2726

NickGeneva avatar Oct 14 '25 22:10 NickGeneva

Appears to be some miss match between the recent releases of netcdf4 and h5py

https://github.com/Unidata/netcdf4-python/issues/1438

NickGeneva avatar Oct 15 '25 15:10 NickGeneva