netcdf4-python icon indicating copy to clipboard operation
netcdf4-python copied to clipboard

HDF5 error with netcdf4 1.7.3 and h5py on linux with pypi

Open shumpohl opened this issue 2 months ago • 15 comments

Our CI broke today-ish with the error below with python 3.12 and 3.13. We do not test with 3.14.

I cannot reproduce the error on windows. The error only appears when h5py is installed and imported.

Limiting the version to "netcdf4<1.7.3" is our current workaround.

A minimal reproducer that leverages xarray is

uv pip install xarray netcdf4==1.7.3 h5py==3.15.0
import xarray as xr
import h5py

ds = xr.Dataset({"rootvar": ("x", [100, 200])})
ds.to_netcdf('./test_out.nc', engine='netcdf4', format="NETCDF4")
CI error message
dt = <xarray.DataTree 'first'>
Group: /
│   Attributes:
│       elicit_md:       b'{"gate":{"gate_1":{"data":0.1,"time":176...          uuid:       0d796dfb-60ef-42bc-9a7c-0f8aa5ca4e99
            uuid_root:  ded3e81e-260f-47c5-b860-a0e233cc93fd
filepath = '/tmp/pytest-of-root/pytest-0/test_simple_experiment_with_xr0/2025-10-15 10_19_22_63 first.nc'
mode = 'w', encoding = {'/foo': {}}, unlimited_dims = {}, format = 'NETCDF4'
engine = 'netcdf4', group = None, write_inherited_coords = False, compute = True
invalid_netcdf = False, auto_complex = None
    def _datatree_to_netcdf(
        dt: DataTree,
        filepath: str | PathLike | io.IOBase | None = None,
        mode: NetcdfWriteModes = "w",
        encoding: Mapping[str, Any] | None = None,
        unlimited_dims: Mapping | None = None,
        format: T_DataTreeNetcdfTypes | None = None,
        engine: T_DataTreeNetcdfEngine | None = None,
        group: str | None = None,
        write_inherited_coords: bool = False,
        compute: bool = True,
        invalid_netcdf: bool = False,
        auto_complex: bool | None = None,
    ) -> None | memoryview | Delayed:
        """Implementation of `DataTree.to_netcdf`."""

        if format not in [None, *get_args(T_DataTreeNetcdfTypes)]:
            raise ValueError("DataTree.to_netcdf only supports the NETCDF4 format")

        if engine not in [None, *get_args(T_DataTreeNetcdfEngine)]:
            raise ValueError(
                "DataTree.to_netcdf only supports the netcdf4 and h5netcdf engines"
            )

        normalized_path = _normalize_path(filepath)

        if engine is None:
            engine = get_default_netcdf_write_engine(
                path_or_file=normalized_path,
                format="NETCDF4",  # required for supporting groups
            )  # type: ignore[assignment]

        if group is not None:
            raise NotImplementedError(
                "specifying a root group for the tree has not been implemented"
            )

        if encoding is None:
            encoding = {}

        # In the future, we may want to expand this check to insure all the provided encoding
        # options are valid. For now, this simply checks that all provided encoding keys are
        # groups in the datatree.
        if set(encoding) - set(dt.groups):
            raise ValueError(
                f"unexpected encoding group name(s) provided: {set(encoding) - set(dt.groups)}"
            )

        if normalized_path is None:
            if not compute:
                raise NotImplementedError(
                    "to_netcdf() with compute=False is not yet implemented when "
                    "returning a memoryview"
                )
            target = BytesIOProxy()
        else:
            target = normalized_path  # type: ignore[assignment]

        if unlimited_dims is None:
            unlimited_dims = {}

        scheduler = get_dask_scheduler()
        have_chunks = any(
            v.chunks is not None for node in dt.subtree for v in node.variables.values()
        )
        autoclose = have_chunks and scheduler in ["distributed", "multiprocessing"]

        root_store = get_writable_netcdf_store(
            target,
            engine,  # type: ignore[arg-type]
            mode=mode,
            format=format,
            autoclose=autoclose,
            invalid_netcdf=invalid_netcdf,
            auto_complex=auto_complex,
        )

        writer = ArrayWriter()

        # TODO: allow this work (setting up the file for writing array data)
        # to be parallelized with dask
        try:
            for node in dt.subtree:
                at_root = node is dt
                dataset = node.to_dataset(inherit=write_inherited_coords or at_root)
                node_store = (
                    root_store if at_root else root_store.get_child_store(node.path)
                )
>               dump_to_store(
                    dataset,
                    node_store,
                    writer,
                    encoding=encoding.get(node.path),
                    unlimited_dims=unlimited_dims.get(node.path),
                )
$VENV/lib/python3.12/site-packages/xarray/backends/writers.py:899:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
$VENV/lib/python3.12/site-packages/xarray/backends/writers.py:491: in dump_to_store
    store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
$VENV/lib/python3.12/site-packages/xarray/backends/common.py:533: in store
    self.set_variables(
$VENV/lib/python3.12/site-packages/xarray/backends/common.py:575: in set_variables
    writer.add(source, target)
$VENV/lib/python3.12/site-packages/xarray/backends/common.py:403: in add
    target[...] = source
    ^^^^^^^^^^^
$VENV/lib/python3.12/site-packages/xarray/backends/netCDF4_.py:95: in __setitem__
    data[key] = value
    ^^^^^^^^^
src/netCDF4/_netCDF4.pyx:5645: in netCDF4._netCDF4.Variable.__setitem__
    ???
src/netCDF4/_netCDF4.pyx:5932: in netCDF4._netCDF4.Variable._put
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>   ???
E   RuntimeError: NetCDF: HDF error
src/netCDF4/_netCDF4.pyx:2160: RuntimeError

shumpohl avatar Oct 15 '25 10:10 shumpohl

if you are installing netcdf4 from PyPI, then we are seeing the same issues for Python 3.10-13. Even for 1.7.2 for Python 3.10 (only, this time around), with netCDF4-1.7.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl. Note: nothing to do with Xarray in our case

valeriupredoi avatar Oct 15 '25 12:10 valeriupredoi

There were many updates in netCDF4-1.7.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl, but if I had to guess one to blame it would be the hdf5 bump from 1.14.2 to 1.14.6. Hard to debug when all of the tests on this side are passing :-/

ocefpaf avatar Oct 15 '25 12:10 ocefpaf

I am trying to get a minimal reproducer. Right now it looks like the error is only triggered when another (yet unidentified) package is present. Otherwise I get the warning

  <frozen importlib._bootstrap>:488: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility. Expected 16 from C header, got 96 from PyObject

shumpohl avatar Oct 15 '25 13:10 shumpohl

I am trying to get a minimal reproducer. Right now it looks like the error is only triggered when another (yet unidentified) package is present.

h5py?

Otherwise I get the warning

  <frozen importlib._bootstrap>:488: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility. Expected 16 from C header, got 96 from PyObject

That is "mostly harmless."

ocefpaf avatar Oct 15 '25 13:10 ocefpaf

h5py

exactly. I guess it is a version mismatch?

shumpohl avatar Oct 15 '25 13:10 shumpohl

exactly. I guess it is a version mismatch?

I'm pretty sure they are the same. This is, sadly, a "feature" of wheels :-( I'm not sure sure how to fix this and that is the main reason why I use conda. One HDF5 for all packages that link to it. It saves us tons of issues.

ocefpaf avatar Oct 15 '25 13:10 ocefpaf

FYI h5py 3.15.0 was released about the same time as necdf4 1.7.3 and we're seeing a number of similar issues there. Although, specifically, h5py appears to break only on macOS, and we already identified a solution.

xref https://github.com/h5py/h5py/issues/2726

neutrinoceros avatar Oct 15 '25 13:10 neutrinoceros

that is the main reason why I use conda. One HDF5 for all packages that link to it. It saves us tons of issues.

+100 and a few 🍺 s for that from me 😁

valeriupredoi avatar Oct 15 '25 13:10 valeriupredoi

Reposting from the h5py repo here vis, incase there's any useful information. I am encountering the similar issue. I'm using a uv environment on linux, the following produces an error when using the latest netcdf4 and h5py:

  • h5py 3.15.0 + netcdf4 1.7.3 = Error
  • h5py 3.14.0 + netcdf4 1.7.3 = Good
  • h5py 3.15.0 + netcdf4 1.7.2 = Good
import numpy as np
import xarray as xr


def main():
    
    np.random.seed(42)
    data = np.random.randn(10, 20, 30)
    ds = xr.Dataset(
        {
            "data": (["time", "lat", "lon"], data),
        },
        coords={
            "time": np.arange(10),
            "lat": np.linspace(-90, 90, 20),
            "lon": np.linspace(-180, 180, 30),
        },
    )
    
    filename = "test.nc"
    ds.to_netcdf(filename, engine="netcdf4")
    ds_loaded = xr.open_dataset(filename, engine="h5netcdf")

    print("\nLoaded dataset:")
    print(ds_loaded)

if __name__ == "__main__":
    main()

(making both engines "h5netcdf" or "netcdf4" also fixes the issue for me)

When this fails it generates the error
Traceback (most recent call last):
  File "/home/ngeneva/Documents/repos/temp/h5py-test/main.py", line 30, in <module>
    main()
    ~~~~^^
  File "/home/ngeneva/Documents/repos/temp/h5py-test/main.py", line 24, in main
    ds_loaded = xr.open_dataset(filename, engine="h5netcdf")
  File "/home/ngeneva/Documents/repos/temp/h5py-test/.venv/lib/python3.13/site-packages/xarray/backends/api.py", line 596, in open_dataset
    backend_ds = backend.open_dataset(
        filename_or_obj,
    ...<2 lines>...
        **kwargs,
    )
  File "/home/ngeneva/Documents/repos/temp/h5py-test/.venv/lib/python3.13/site-packages/xarray/backends/h5netcdf_.py", line 502, in open_dataset
    store = H5NetCDFStore.open(
        filename_or_obj,
    ...<8 lines>...
        storage_options=storage_options,
    )
  File "/home/ngeneva/Documents/repos/temp/h5py-test/.venv/lib/python3.13/site-packages/xarray/backends/h5netcdf_.py", line 226, in open
    return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)
  File "/home/ngeneva/Documents/repos/temp/h5py-test/.venv/lib/python3.13/site-packages/xarray/backends/h5netcdf_.py", line 149, in __init__
    self._filename = find_root_and_group(self.ds)[0].filename
                                         ^^^^^^^
  File "/home/ngeneva/Documents/repos/temp/h5py-test/.venv/lib/python3.13/site-packages/xarray/backends/h5netcdf_.py", line 237, in ds
    return self._acquire()
           ~~~~~~~~~~~~~^^
  File "/home/ngeneva/Documents/repos/temp/h5py-test/.venv/lib/python3.13/site-packages/xarray/backends/h5netcdf_.py", line 229, in _acquire
    with self._manager.acquire_context(needs_lock) as root:
         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
  File "/home/ngeneva/.local/share/uv/python/cpython-3.13.2-linux-x86_64-gnu/lib/python3.13/contextlib.py", line 141, in __enter__
    return next(self.gen)
  File "/home/ngeneva/Documents/repos/temp/h5py-test/.venv/lib/python3.13/site-packages/xarray/backends/file_manager.py", line 207, in acquire_context
    file, cached = self._acquire_with_cache_info(needs_lock)
                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
  File "/home/ngeneva/Documents/repos/temp/h5py-test/.venv/lib/python3.13/site-packages/xarray/backends/file_manager.py", line 225, in _acquire_with_cache_info
    file = self._opener(*self._args, **kwargs)
  File "/home/ngeneva/Documents/repos/temp/h5py-test/.venv/lib/python3.13/site-packages/h5netcdf/core.py", line 1607, in __init__
    super().__init__(self, self._h5path)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
  File "/home/ngeneva/Documents/repos/temp/h5py-test/.venv/lib/python3.13/site-packages/h5netcdf/core.py", line 911, in __init__
    if _unlabeled_dimension_mix(v) == "unlabeled":
       ~~~~~~~~~~~~~~~~~~~~~~~~^^^
  File "/home/ngeneva/Documents/repos/temp/h5py-test/.venv/lib/python3.13/site-packages/h5netcdf/core.py", line 680, in _unlabeled_dimension_mix
    dimset = {len(j) for j in dimlist}
              ~~~^^^
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/ngeneva/Documents/repos/temp/h5py-test/.venv/lib/python3.13/site-packages/h5py/_hl/dims.py", line 60, in __len__
    return h5ds.get_num_scales(self._id, self._dimension)
           ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5ds.pyx", line 71, in h5py.h5ds.get_num_scales
  File "h5py/defs.pyx", line 4282, in h5py.defs.H5DSget_num_scales
RuntimeError: Unspecified error in H5DSget_num_scales (return value <0)

NickGeneva avatar Oct 15 '25 16:10 NickGeneva

The error @NickGeneva shows (from H5DSget_num_scales while reading through h5py) is also different from the original errors @shumpohl showed (while writing with NetCDF4). They may well be related, of course, but don't assume they're the same thing.

I think xarray imports the modules for its engines lazily when you try to use them. So the original reproducer is importing h5py before netCDF4, and @NickGeneva's example is the other way around (netCDF4 then h5py).

takluyver avatar Oct 15 '25 17:10 takluyver

H5py 3.15.1 is now available. Hopefully it should help clarify where remaining issues are coming from.

neutrinoceros avatar Oct 16 '25 14:10 neutrinoceros

I don't think 3.15.1 is going to make any difference to this issue, as it's been observed on Linux.

takluyver avatar Oct 16 '25 14:10 takluyver

Yes, I wouldn't expect it to, but at least macOS users have a more sane baseline to test against.

neutrinoceros avatar Oct 16 '25 15:10 neutrinoceros

We are also having trouble with h5py==3.15.1 and netCDF4>=1.7.2: https://github.com/NREL/PVDegradationTools/issues/278

h5py==3.15.1 and netCDF4==1.7.2 works for python>=3.11 but fails for python 3.10 https://github.com/NREL/PVDegradationTools/actions/runs/18564993020/job/52927232647 h5py==3.15.1 and netCDF4==1.7.3 fails for all python versions https://github.com/NREL/PVDegradationTools/actions/runs/18566129336/job/52927606851?pr=282

E OSError: [Errno -101] NetCDF: HDF error: '/home/runner/work/PVDegradationTools/PVDegradationTools/tests/data/distributed_pvgis_weather.nc'

martin-springer avatar Oct 16 '25 15:10 martin-springer

Has there been any progress on this? Would love to test Python 3.14, but can't upgrade h5py due to this bug.

adamjstewart avatar Dec 09 '25 12:12 adamjstewart