xarray icon indicating copy to clipboard operation
xarray copied to clipboard

Lazy saving to NetCDF4 fails randomly if an array is used multiple times

Open pnuu opened this issue 3 years ago • 1 comments

What happened?

Saving xr.Dataset() lazily to NetCDF4 (dset.to_netcdf(..., compute=False)) fails seemingly randomly if an array is used either as a coordinate to multiple variables, or saved with different names as standalone variable. The trace I get is shown below in the log section.

What did you expect to happen?

The saving should work consistently between different runs.

Minimal Complete Verifiable Example

#!/usr/bin/env python

import datetime as dt

import numpy as np
import dask.array as da
import xarray as xr

COMPUTE = False
FNAME = "xr_test.nc"


def main():
    y = np.arange(1000, dtype=np.uint16)
    x = np.arange(2000, dtype=np.uint16)

    # Create a time array that is used as a Y-coordinate for the data
    now = dt.datetime.utcnow()
    time_arr = np.array([now + dt.timedelta(seconds=i) for i in range(y.size)], dtype=np.datetime64)
    times = xr.DataArray(time_arr, coords={'y': y})

    # Write root
    root = xr.Dataset({}, attrs={'global': 'attribute'})
    written = [root.to_netcdf(FNAME, mode='w')]

    # Write first dataset
    data1 = xr.DataArray(da.random.random((y.size, x.size)), dims=['y', 'x'],
                         coords={'y': y, 'x': x, 'time': times})
    dset1 = xr.Dataset({'data1': data1})
    written.append(dset1.to_netcdf(FNAME, mode='a', compute=COMPUTE))

    # Write second dataset using the same time coordinates
    data2 = xr.DataArray(da.random.random((y.size, x.size)), dims=['y', 'x'],
                         coords={'y': y, 'x': x, 'time': times})
    dset2 = xr.Dataset({'data2': data2})
    written.append(dset2.to_netcdf(FNAME, mode='a', compute=COMPUTE))

    if not COMPUTE:
        da.compute(written)


if __name__ == "__main__":
    main()

Relevant log output

Traceback (most recent call last):
  File "/home/lahtinep/bin/test_lazy_netcdf_saving.py", line 43, in <module>
    main()
  File "/home/lahtinep/bin/test_lazy_netcdf_saving.py", line 39, in main
    da.compute(written)
  File "/home/lahtinep/mambaforge/envs/pytroll/lib/python3.9/site-packages/dask/base.py", line 571, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/lahtinep/mambaforge/envs/pytroll/lib/python3.9/site-packages/dask/threaded.py", line 79, in get
    results = get_async(
  File "/home/lahtinep/mambaforge/envs/pytroll/lib/python3.9/site-packages/dask/local.py", line 507, in get_async
    raise_exception(exc, tb)
  File "/home/lahtinep/mambaforge/envs/pytroll/lib/python3.9/site-packages/dask/local.py", line 315, in reraise
    raise exc
  File "/home/lahtinep/mambaforge/envs/pytroll/lib/python3.9/site-packages/dask/local.py", line 220, in execute_task
    result = _execute_task(task, data)
  File "/home/lahtinep/mambaforge/envs/pytroll/lib/python3.9/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/lahtinep/mambaforge/envs/pytroll/lib/python3.9/site-packages/dask/array/core.py", line 4099, in store_chunk
    return load_store_chunk(x, out, index, lock, return_stored, False)
  File "/home/lahtinep/mambaforge/envs/pytroll/lib/python3.9/site-packages/dask/array/core.py", line 4086, in load_store_chunk
    out[index] = x
  File "/home/lahtinep/mambaforge/envs/pytroll/lib/python3.9/site-packages/xarray/backends/netCDF4_.py", line 69, in __setitem__
    data[key] = value
  File "src/netCDF4/_netCDF4.pyx", line 4903, in netCDF4._netCDF4.Variable.__setitem__
  File "src/netCDF4/_netCDF4.pyx", line 4073, in netCDF4._netCDF4.Variable.shape.__get__
  File "src/netCDF4/_netCDF4.pyx", line 3462, in netCDF4._netCDF4.Dimension.__len__
  File "src/netCDF4/_netCDF4.pyx", line 1927, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: Not a valid ID

Anything else we need to know?

The above script fails randomly, thus it should be run several times. Out of ten runs I got the trace twice. If COMPUTE = True, the script works every time (after ~100 tries, at least).

The same behaviour is seen if the time coordinates are removed completely and data1 is used also in dset2 in place of data2.

Environment

INSTALLED VERSIONS

commit: None python: 3.9.9 | packaged by conda-forge | (main, Dec 20 2021, 02:41:03) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.13.0-30-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1

xarray: 0.20.2 pandas: 1.3.5 numpy: 1.22.0 scipy: 1.7.3 netCDF4: 1.5.8 pydap: None h5netcdf: 0.13.0 h5py: 3.6.0 Nio: None zarr: 2.10.3 cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.10 cfgrib: None iris: None bottleneck: None dask: 2022.01.0 distributed: 2022.01.0 matplotlib: 3.5.1 cartopy: 0.20.2 seaborn: 0.11.2 numbagg: None fsspec: 2022.01.0 cupy: None pint: None sparse: None setuptools: 59.8.0 pip: 21.3.1 conda: None pytest: 6.2.5 IPython: 8.0.0 sphinx: 4.3.2

pnuu avatar Feb 24 '22 14:02 pnuu

I experience the same problem under the same circumstances. My versions:

INSTALLED VERSIONS
------------------
commit: None
python: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:35:26) [GCC 10.4.0]
python-bits: 64
OS: Linux
OS-release: 4.18.0-305.12.1.el8_4.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.8.1

xarray: 0.19.0
pandas: 1.5.0
numpy: 1.23.3
scipy: 1.9.1
netCDF4: 1.6.1
pydap: None
h5netcdf: None
h5py: 3.7.0
Nio: None
zarr: 2.13.3
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.3.2
cfgrib: None
iris: None
bottleneck: None
dask: 2021.12.0
distributed: 2022.9.2
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 65.4.1
pip: 22.2.2
conda: None
pytest: None
IPython: 8.5.0
sphinx: None

gerritholl avatar Oct 12 '22 07:10 gerritholl