HDF5 error with netcdf4 1.7.3 and h5py on linux with pypi
Our CI broke today-ish with the error below with python 3.12 and 3.13. We do not test with 3.14.
I cannot reproduce the error on windows. The error only appears when h5py is installed and imported.
Limiting the version to "netcdf4<1.7.3" is our current workaround.
A minimal reproducer that leverages xarray is
uv pip install xarray netcdf4==1.7.3 h5py==3.15.0
import xarray as xr
import h5py
ds = xr.Dataset({"rootvar": ("x", [100, 200])})
ds.to_netcdf('./test_out.nc', engine='netcdf4', format="NETCDF4")
CI error message
dt = <xarray.DataTree 'first'>
Group: /
│ Attributes:
│ elicit_md: b'{"gate":{"gate_1":{"data":0.1,"time":176... uuid: 0d796dfb-60ef-42bc-9a7c-0f8aa5ca4e99
uuid_root: ded3e81e-260f-47c5-b860-a0e233cc93fd
filepath = '/tmp/pytest-of-root/pytest-0/test_simple_experiment_with_xr0/2025-10-15 10_19_22_63 first.nc'
mode = 'w', encoding = {'/foo': {}}, unlimited_dims = {}, format = 'NETCDF4'
engine = 'netcdf4', group = None, write_inherited_coords = False, compute = True
invalid_netcdf = False, auto_complex = None
def _datatree_to_netcdf(
dt: DataTree,
filepath: str | PathLike | io.IOBase | None = None,
mode: NetcdfWriteModes = "w",
encoding: Mapping[str, Any] | None = None,
unlimited_dims: Mapping | None = None,
format: T_DataTreeNetcdfTypes | None = None,
engine: T_DataTreeNetcdfEngine | None = None,
group: str | None = None,
write_inherited_coords: bool = False,
compute: bool = True,
invalid_netcdf: bool = False,
auto_complex: bool | None = None,
) -> None | memoryview | Delayed:
"""Implementation of `DataTree.to_netcdf`."""
if format not in [None, *get_args(T_DataTreeNetcdfTypes)]:
raise ValueError("DataTree.to_netcdf only supports the NETCDF4 format")
if engine not in [None, *get_args(T_DataTreeNetcdfEngine)]:
raise ValueError(
"DataTree.to_netcdf only supports the netcdf4 and h5netcdf engines"
)
normalized_path = _normalize_path(filepath)
if engine is None:
engine = get_default_netcdf_write_engine(
path_or_file=normalized_path,
format="NETCDF4", # required for supporting groups
) # type: ignore[assignment]
if group is not None:
raise NotImplementedError(
"specifying a root group for the tree has not been implemented"
)
if encoding is None:
encoding = {}
# In the future, we may want to expand this check to insure all the provided encoding
# options are valid. For now, this simply checks that all provided encoding keys are
# groups in the datatree.
if set(encoding) - set(dt.groups):
raise ValueError(
f"unexpected encoding group name(s) provided: {set(encoding) - set(dt.groups)}"
)
if normalized_path is None:
if not compute:
raise NotImplementedError(
"to_netcdf() with compute=False is not yet implemented when "
"returning a memoryview"
)
target = BytesIOProxy()
else:
target = normalized_path # type: ignore[assignment]
if unlimited_dims is None:
unlimited_dims = {}
scheduler = get_dask_scheduler()
have_chunks = any(
v.chunks is not None for node in dt.subtree for v in node.variables.values()
)
autoclose = have_chunks and scheduler in ["distributed", "multiprocessing"]
root_store = get_writable_netcdf_store(
target,
engine, # type: ignore[arg-type]
mode=mode,
format=format,
autoclose=autoclose,
invalid_netcdf=invalid_netcdf,
auto_complex=auto_complex,
)
writer = ArrayWriter()
# TODO: allow this work (setting up the file for writing array data)
# to be parallelized with dask
try:
for node in dt.subtree:
at_root = node is dt
dataset = node.to_dataset(inherit=write_inherited_coords or at_root)
node_store = (
root_store if at_root else root_store.get_child_store(node.path)
)
> dump_to_store(
dataset,
node_store,
writer,
encoding=encoding.get(node.path),
unlimited_dims=unlimited_dims.get(node.path),
)
$VENV/lib/python3.12/site-packages/xarray/backends/writers.py:899:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
$VENV/lib/python3.12/site-packages/xarray/backends/writers.py:491: in dump_to_store
store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
$VENV/lib/python3.12/site-packages/xarray/backends/common.py:533: in store
self.set_variables(
$VENV/lib/python3.12/site-packages/xarray/backends/common.py:575: in set_variables
writer.add(source, target)
$VENV/lib/python3.12/site-packages/xarray/backends/common.py:403: in add
target[...] = source
^^^^^^^^^^^
$VENV/lib/python3.12/site-packages/xarray/backends/netCDF4_.py:95: in __setitem__
data[key] = value
^^^^^^^^^
src/netCDF4/_netCDF4.pyx:5645: in netCDF4._netCDF4.Variable.__setitem__
???
src/netCDF4/_netCDF4.pyx:5932: in netCDF4._netCDF4.Variable._put
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
E RuntimeError: NetCDF: HDF error
src/netCDF4/_netCDF4.pyx:2160: RuntimeError
if you are installing netcdf4 from PyPI, then we are seeing the same issues for Python 3.10-13. Even for 1.7.2 for Python 3.10 (only, this time around), with netCDF4-1.7.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl. Note: nothing to do with Xarray in our case
There were many updates in netCDF4-1.7.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl, but if I had to guess one to blame it would be the hdf5 bump from 1.14.2 to 1.14.6. Hard to debug when all of the tests on this side are passing :-/
I am trying to get a minimal reproducer. Right now it looks like the error is only triggered when another (yet unidentified) package is present. Otherwise I get the warning
<frozen importlib._bootstrap>:488: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility. Expected 16 from C header, got 96 from PyObject
I am trying to get a minimal reproducer. Right now it looks like the error is only triggered when another (yet unidentified) package is present.
h5py?
Otherwise I get the warning
<frozen importlib._bootstrap>:488: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility. Expected 16 from C header, got 96 from PyObject
That is "mostly harmless."
h5py
exactly. I guess it is a version mismatch?
exactly. I guess it is a version mismatch?
I'm pretty sure they are the same. This is, sadly, a "feature" of wheels :-( I'm not sure sure how to fix this and that is the main reason why I use conda. One HDF5 for all packages that link to it. It saves us tons of issues.
FYI h5py 3.15.0 was released about the same time as necdf4 1.7.3 and we're seeing a number of similar issues there. Although, specifically, h5py appears to break only on macOS, and we already identified a solution.
xref https://github.com/h5py/h5py/issues/2726
that is the main reason why I use conda. One HDF5 for all packages that link to it. It saves us tons of issues.
+100 and a few 🍺 s for that from me 😁
Reposting from the h5py repo here vis, incase there's any useful information. I am encountering the similar issue. I'm using a uv environment on linux, the following produces an error when using the latest netcdf4 and h5py:
- h5py 3.15.0 + netcdf4 1.7.3 = Error
- h5py 3.14.0 + netcdf4 1.7.3 = Good
- h5py 3.15.0 + netcdf4 1.7.2 = Good
import numpy as np
import xarray as xr
def main():
np.random.seed(42)
data = np.random.randn(10, 20, 30)
ds = xr.Dataset(
{
"data": (["time", "lat", "lon"], data),
},
coords={
"time": np.arange(10),
"lat": np.linspace(-90, 90, 20),
"lon": np.linspace(-180, 180, 30),
},
)
filename = "test.nc"
ds.to_netcdf(filename, engine="netcdf4")
ds_loaded = xr.open_dataset(filename, engine="h5netcdf")
print("\nLoaded dataset:")
print(ds_loaded)
if __name__ == "__main__":
main()
(making both engines "h5netcdf" or "netcdf4" also fixes the issue for me)
When this fails it generates the error
Traceback (most recent call last):
File "/home/ngeneva/Documents/repos/temp/h5py-test/main.py", line 30, in <module>
main()
~~~~^^
File "/home/ngeneva/Documents/repos/temp/h5py-test/main.py", line 24, in main
ds_loaded = xr.open_dataset(filename, engine="h5netcdf")
File "/home/ngeneva/Documents/repos/temp/h5py-test/.venv/lib/python3.13/site-packages/xarray/backends/api.py", line 596, in open_dataset
backend_ds = backend.open_dataset(
filename_or_obj,
...<2 lines>...
**kwargs,
)
File "/home/ngeneva/Documents/repos/temp/h5py-test/.venv/lib/python3.13/site-packages/xarray/backends/h5netcdf_.py", line 502, in open_dataset
store = H5NetCDFStore.open(
filename_or_obj,
...<8 lines>...
storage_options=storage_options,
)
File "/home/ngeneva/Documents/repos/temp/h5py-test/.venv/lib/python3.13/site-packages/xarray/backends/h5netcdf_.py", line 226, in open
return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)
File "/home/ngeneva/Documents/repos/temp/h5py-test/.venv/lib/python3.13/site-packages/xarray/backends/h5netcdf_.py", line 149, in __init__
self._filename = find_root_and_group(self.ds)[0].filename
^^^^^^^
File "/home/ngeneva/Documents/repos/temp/h5py-test/.venv/lib/python3.13/site-packages/xarray/backends/h5netcdf_.py", line 237, in ds
return self._acquire()
~~~~~~~~~~~~~^^
File "/home/ngeneva/Documents/repos/temp/h5py-test/.venv/lib/python3.13/site-packages/xarray/backends/h5netcdf_.py", line 229, in _acquire
with self._manager.acquire_context(needs_lock) as root:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
File "/home/ngeneva/.local/share/uv/python/cpython-3.13.2-linux-x86_64-gnu/lib/python3.13/contextlib.py", line 141, in __enter__
return next(self.gen)
File "/home/ngeneva/Documents/repos/temp/h5py-test/.venv/lib/python3.13/site-packages/xarray/backends/file_manager.py", line 207, in acquire_context
file, cached = self._acquire_with_cache_info(needs_lock)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
File "/home/ngeneva/Documents/repos/temp/h5py-test/.venv/lib/python3.13/site-packages/xarray/backends/file_manager.py", line 225, in _acquire_with_cache_info
file = self._opener(*self._args, **kwargs)
File "/home/ngeneva/Documents/repos/temp/h5py-test/.venv/lib/python3.13/site-packages/h5netcdf/core.py", line 1607, in __init__
super().__init__(self, self._h5path)
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
File "/home/ngeneva/Documents/repos/temp/h5py-test/.venv/lib/python3.13/site-packages/h5netcdf/core.py", line 911, in __init__
if _unlabeled_dimension_mix(v) == "unlabeled":
~~~~~~~~~~~~~~~~~~~~~~~~^^^
File "/home/ngeneva/Documents/repos/temp/h5py-test/.venv/lib/python3.13/site-packages/h5netcdf/core.py", line 680, in _unlabeled_dimension_mix
dimset = {len(j) for j in dimlist}
~~~^^^
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/home/ngeneva/Documents/repos/temp/h5py-test/.venv/lib/python3.13/site-packages/h5py/_hl/dims.py", line 60, in __len__
return h5ds.get_num_scales(self._id, self._dimension)
~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5ds.pyx", line 71, in h5py.h5ds.get_num_scales
File "h5py/defs.pyx", line 4282, in h5py.defs.H5DSget_num_scales
RuntimeError: Unspecified error in H5DSget_num_scales (return value <0)
The error @NickGeneva shows (from H5DSget_num_scales while reading through h5py) is also different from the original errors @shumpohl showed (while writing with NetCDF4). They may well be related, of course, but don't assume they're the same thing.
I think xarray imports the modules for its engines lazily when you try to use them. So the original reproducer is importing h5py before netCDF4, and @NickGeneva's example is the other way around (netCDF4 then h5py).
H5py 3.15.1 is now available. Hopefully it should help clarify where remaining issues are coming from.
I don't think 3.15.1 is going to make any difference to this issue, as it's been observed on Linux.
Yes, I wouldn't expect it to, but at least macOS users have a more sane baseline to test against.
We are also having trouble with h5py==3.15.1 and netCDF4>=1.7.2: https://github.com/NREL/PVDegradationTools/issues/278
h5py==3.15.1 and netCDF4==1.7.2 works for python>=3.11 but fails for python 3.10 https://github.com/NREL/PVDegradationTools/actions/runs/18564993020/job/52927232647 h5py==3.15.1 and netCDF4==1.7.3 fails for all python versions https://github.com/NREL/PVDegradationTools/actions/runs/18566129336/job/52927606851?pr=282
E OSError: [Errno -101] NetCDF: HDF error: '/home/runner/work/PVDegradationTools/PVDegradationTools/tests/data/distributed_pvgis_weather.nc'
Has there been any progress on this? Would love to test Python 3.14, but can't upgrade h5py due to this bug.