🐛[BUG]: h5py fault
Version
main
On which installation method(s) does this occur?
source
Describe the issue
The most recent h5py update (3.15.0) seems to break some of our xarray loads of netcdf files in the CI environment.
Its unknown what the issue is, but for now its recommended to downgrade to 3.14.0 of h5py...
Switching the xarray engine from h5netcdf -> netcdf4 is a work around for NC files.
I have not locked the h5py range since this seems to only surface in a few tests, and the new uv.lock file should select 3.14.0 h5py. If you encounter a similar issue, downgrade h5py
Test failure looks like:
RuntimeError: Unspecified error in H5DSget_num_scales (return value <0)
Task exception was never retrieved
future: <Task finished name='Task-1678' coro=<tqdm_asyncio.gather.<locals>.wrap_awaitable() done, defined at /usr/local/lib/python3.12/dist-packages/tqdm/asyncio.py:75> exception=RuntimeError('Unspecified error in H5DSget_num_scales (return value <0)')>
Traceback (most recent call last):
File "/usr/lib/python3.12/asyncio/tasks.py", line 316, in __step_run_and_handle_result
result = coro.throw(exc)
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tqdm/asyncio.py", line 76, in wrap_awaitable
return i, await f
^^^^^^^
File "/builds/modulus/earth-2/earth2studio/src/earth2studio/data/ncar.py", line 274, in fetch_wrapper
out = await self.fetch_array(
^^^^^^^^^^^^^^^^^^^^^^^
File "/builds/modulus/earth-2/earth2studio/src/earth2studio/data/ncar.py", line 336, in fetch_array
ds = await asyncio.to_thread(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/futures.py", line 287, in __await__
yield self # This tells Task to wait for completion.
^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/tasks.py", line 385, in __wakeup
future.result()
File "/usr/lib/python3.12/asyncio/futures.py", line 203, in result
raise self._exception.with_traceback(self._exception_tb)
File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/xarray/backends/api.py", line 596, in open_dataset
backend_ds = backend.open_dataset(
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/xarray/backends/h5netcdf_.py", line 502, in open_dataset
store = H5NetCDFStore.open(
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/xarray/backends/h5netcdf_.py", line 225, in open
manager = manager_cls(h5netcdf.File, filename, mode=mode, kwargs=kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/xarray/backends/file_manager.py", line 370, in __init__
self._file: T_File | None = opener(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/h5netcdf/core.py", line 1607, in __init__
super().__init__(self, self._h5path)
File "/usr/local/lib/python3.12/dist-packages/h5netcdf/core.py", line 911, in __init__
if _unlabeled_dimension_mix(v) == "unlabeled":
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/h5netcdf/core.py", line 680, in _unlabeled_dimension_mix
dimset = {len(j) for j in dimlist}
^^^^^^
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/usr/local/lib/python3.12/dist-packages/h5py/_hl/dims.py", line 60, in __len__
return h5ds.get_num_scales(self._id, self._dimension)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5ds.pyx", line 71, in h5py.h5ds.get_num_scales
File "h5py/defs.pyx", line 4282, in h5py.defs.H5DSget_num_scales
RuntimeError: Unspecified error in H5DSget_num_scales (return value <0)
MRE:
import numpy as np
import xarray as xr
def main():
np.random.seed(42)
data = np.random.randn(10, 20, 30)
ds = xr.Dataset(
{
"data": (["time", "lat", "lon"], data),
},
coords={
"time": np.arange(10),
"lat": np.linspace(-90, 90, 20),
"lon": np.linspace(-180, 180, 30),
},
)
filename = "test.nc"
ds.to_netcdf(filename, engine="netcdf4")
ds_loaded = xr.open_dataset(filename, engine="h5netcdf")
print("\nLoaded dataset:")
print(ds_loaded)
if __name__ == "__main__":
main()
xref: https://github.com/h5py/h5py/issues/2726
Appears to be some miss match between the recent releases of netcdf4 and h5py
https://github.com/Unidata/netcdf4-python/issues/1438