netcdf4-python
netcdf4-python copied to clipboard
Many HDF5-DIAG warnings when used with xarray and dask
Steps to reproduce:
- Environment: Python 3.10, Ubuntu 22.04, the packages were installed in a new clean virtual environment using
pip install netcdf4 dask xarray - Create a new netcdf4 dataset, read it with xarray and dask; see example script below.
- Expected behavior: the script runs without warning
- Actual behavior: the script runs successfully but with many HDF5-DIAG warnings
pip list:
Package Version
------------------ --------
cftime 1.6.2
click 8.1.3
cloudpickle 2.2.1
dask 2023.3.2
fsspec 2023.3.0
importlib-metadata 6.1.0
locket 1.0.0
netCDF4 1.6.3
numpy 1.24.2
packaging 23.0
pandas 1.5.3
partd 1.3.0
pip 23.0.1
python-dateutil 2.8.2
pytz 2023.3
PyYAML 6.0
setuptools 67.4.0
six 1.16.0
toolz 0.12.0
wheel 0.38.4
xarray 2023.3.0
zipp 3.15.0
Example script (taken from https://github.com/pydata/xarray/issues/7549#issue-1596115847):
import argparse
import pathlib
import tempfile
from typing import List
import netCDF4
import xarray
HERE = pathlib.Path(__file__).parent
def add_arguments(parser: argparse.ArgumentParser):
parser.add_argument('count', type=int, default=200, nargs='?')
parser.add_argument('--file-cache-maxsize', type=int, required=False)
def main():
parser = argparse.ArgumentParser()
add_arguments(parser)
opts = parser.parse_args()
if opts.file_cache_maxsize is not None:
xarray.set_options(file_cache_maxsize=opts.file_cache_maxsize)
temp_dir = tempfile.mkdtemp(dir=HERE, prefix='work-dir-')
work_dir = pathlib.Path(temp_dir)
print("Working in", work_dir.name)
print("Making", opts.count, "datasets")
dataset_paths = make_many_datasets(work_dir, count=opts.count)
print("Combining", len(dataset_paths), "datasets")
dataset = xarray.open_mfdataset(dataset_paths, lock=False)
dataset.to_netcdf(work_dir / 'combined.nc')
def make_many_datasets(
work_dir: pathlib.Path,
count: int = 200
) -> List[pathlib.Path]:
dataset_paths = []
for i in range(count):
variable = f'var_{i}'
path = work_dir / f'{variable}.nc'
dataset_paths.append(path)
make_dataset(path, variable)
return dataset_paths
def make_dataset(
path: pathlib.Path,
variable: str,
) -> None:
ds = netCDF4.Dataset(path, "w", format="NETCDF4")
ds.createDimension("x", 1)
var = ds.createVariable(variable, "i8", ("x",))
var[:] = 1
ds.close()
if __name__ == '__main__':
main()
Content of stdout:
Working in work-dir-kt4qcsng
Making 200 datasets
Combining 200 datasets
Content of stderr (partial):
HDF5-DIAG: Error detected in HDF5 (1.12.2) thread 1:
#000: H5A.c line 528 in H5Aopen_by_name(): can't open attribute
major: Attribute
minor: Can't open object
#001: H5VLcallback.c line 1091 in H5VL_attr_open(): attribute open failed
major: Virtual Object Layer
minor: Can't open object
#002: H5VLcallback.c line 1058 in H5VL__attr_open(): attribute open failed
major: Virtual Object Layer
minor: Can't open object
#003: H5VLnative_attr.c line 130 in H5VL__native_attr_open(): can't open attribute
major: Attribute
minor: Can't open object
#004: H5Aint.c line 545 in H5A__open_by_name(): unable to load attribute info from object header
major: Attribute
minor: Unable to initialize object
#005: H5Oattribute.c line 494 in H5O__attr_open_by_name(): can't locate attribute: '_QuantizeBitGroomNumberOfSignificantDigits'
major: Attribute
minor: Object not found
HDF5-DIAG: Error detected in HDF5 (1.12.2) thread 1:
#000: H5A.c line 528 in H5Aopen_by_name(): can't open attribute
major: Attribute
minor: Can't open object
#001: H5VLcallback.c line 1091 in H5VL_attr_open(): attribute open failed
major: Virtual Object Layer
minor: Can't open object
#002: H5VLcallback.c line 1058 in H5VL__attr_open(): attribute open failed
major: Virtual Object Layer
minor: Can't open object
#003: H5VLnative_attr.c line 130 in H5VL__native_attr_open(): can't open attribute
major: Attribute
minor: Can't open object
#004: H5Aint.c line 545 in H5A__open_by_name(): unable to load attribute info from object header
major: Attribute
minor: Unable to initialize object
#005: H5Oattribute.c line 494 in H5O__attr_open_by_name(): can't locate attribute: '_QuantizeGranularBitRoundNumberOfSignificantDigits'
major: Attribute
minor: Object not found
HDF5-DIAG: Error detected in HDF5 (1.12.2) thread 1:
#000: H5A.c line 528 in H5Aopen_by_name(): can't open attribute
major: Attribute
minor: Can't open object
#001: H5VLcallback.c line 1091 in H5VL_attr_open(): attribute open failed
major: Virtual Object Layer
minor: Can't open object
#002: H5VLcallback.c line 1058 in H5VL__attr_open(): attribute open failed
major: Virtual Object Layer
minor: Can't open object
#003: H5VLnative_attr.c line 130 in H5VL__native_attr_open(): can't open attribute
major: Attribute
minor: Can't open object
#004: H5Aint.c line 545 in H5A__open_by_name(): unable to load attribute info from object header
major: Attribute
minor: Unable to initialize object
#005: H5Oattribute.c line 494 in H5O__attr_open_by_name(): can't locate attribute: '_QuantizeBitRoundNumberOfSignificantBits'
major: Attribute
minor: Object not found
HDF5-DIAG: Error detected in HDF5 (1.12.2) thread 2:
#000: H5A.c line 528 in H5Aopen_by_name(): can't open attribute
major: Attribute
minor: Can't open object
...
Downgrading to netcdf4=1.5.8 fixes the issue. I don't know if the root cause is netcdf4, xarray or dask, I just wanted to raise awareness here. Probably related: https://github.com/Unidata/netcdf4-python/issues/1241
I was able to reproduce this with conda packages. Only when downgrading hdf5 to 1.12.1 the outputs went away :-/
Is there a solution to this yet? The massive diag output (40k lines at a time) makes it really difficult to debug applications built with netcdf4