xarray icon indicating copy to clipboard operation
xarray copied to clipboard

test_open_nczarr uses too much memory

Open QuLogic opened this issue 1 year ago • 2 comments

What happened?

I'm updating builds for Fedora to 2022.06.0, and running tests, the process runs out of memory and is OOM-killed. Running all the steps manually, this gets to the to_netcdf call in createTestNCZarr._create_nczarr. At that point, memory usage rises to 8.9G resident / 17.7G virtual, and the process is killed (actually, the entire SSH session is killed).

What did you expect to happen?

Tests pass without issue.

Minimal Complete Verifiable Example

import xarray as xr
from xarray.tests.test_dataset import create_test_data
ds = create_test_data()
ds = ds.drop_vars("dim3")
ds.to_netcdf(f"file://foo.zarr#mode=nczarr,noxarray")

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

Jul 31 22:56:07 hostnmme systemd-oomd[805]: Killed /user.slice/user-1000.slice/session-131.scope due to memory used (15294935040) / total (15569043456) and swap used (15308353536) / total (16978534400) being more than 90.00%
Jul 31 22:56:07 hostname systemd[1]: session-131.scope: systemd-oomd killed 6 process(es) in this unit.

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None python: 3.11.0b5 (main, Jul 26 2022, 00:00:00) [GCC 12.1.1 20220628 (Red Hat 12.1.1-3)] python-bits: 64 OS: Linux OS-release: 5.17.13-300.fc36.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.9.0

xarray: 2022.6.0 pandas: 1.3.5 numpy: 1.22.0 scipy: 1.8.1 netCDF4: 1.6.0 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.11.3 cftime: 1.6.0 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.10 cfgrib: None iris: None bottleneck: None dask: 2022.7.1 distributed: None matplotlib: 3.5.2 cartopy: None seaborn: 0.11.2 numbagg: None fsspec: 2022.5.0 cupy: None pint: 0.16.1 sparse: None flox: None numpy_groupies: None setuptools: 62.6.0 pip: 22.2 conda: None pytest: 7.1.2 IPython: None sphinx: 5.0.2

QuLogic avatar Aug 01 '22 03:08 QuLogic

Hmm, I seem to be able to build on Fedora 36 (instead of Rawhide above), with the following versions:

INSTALLED VERSIONS

commit: None python: 3.10.5 (main, Jun 9 2022, 00:00:00) [GCC 12.1.1 20220507 (Red Hat 12.1.1-1)] python-bits: 64 OS: Linux OS-release: 5.17.13-300.fc36.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1

xarray: 2022.6.0 pandas: 1.3.5 numpy: 1.22.0 scipy: 1.8.1 netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.11.3 cftime: 1.5.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.10 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2022.05.0 distributed: None matplotlib: 3.5.2 cartopy: None seaborn: 0.11.1 numbagg: None fsspec: 2022.5.0 cupy: None pint: 0.16.1 sparse: None flox: None numpy_groupies: None setuptools: 59.6.0 pip: 21.3.1 conda: None pytest: 6.2.5 IPython: None sphinx: 4.4.0

QuLogic avatar Aug 01 '22 03:08 QuLogic

Backporting netcdf 4.9.0 to Fedora 36 also OOMs, so there's something weird there, but not sure if it's netCDF, or something that xarray is doing.

QuLogic avatar Aug 01 '22 06:08 QuLogic

Still broken in 2022.11.0; any ideas what to look at here?

QuLogic avatar Nov 26 '22 23:11 QuLogic

Hmmm this runs basically instantly on my macbook (netCDF 1.6.0), so I really don't know.

Can you try a pure netCDF4 example with no xarray?

INSTALLED VERSIONS ------------------ commit: None python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:05:47) [Clang 12.0.1 ] python-bits: 64 OS: Darwin OS-release: 21.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1

xarray: 2022.10.1.dev16+g9b22610bc pandas: 1.5.1 numpy: 1.23.4 scipy: 1.9.3 netCDF4: 1.6.0 pydap: installed h5netcdf: 1.0.2 h5py: 3.7.0 Nio: None zarr: 2.13.3 cftime: 1.6.2

dcherian avatar Nov 29 '22 03:11 dcherian

I could, but I'm not sure what to try. I tried running https://github.com/Unidata/netcdf4-python/blob/master/examples/tutorial.py with those versions and that worked without issue.

QuLogic avatar Nov 29 '22 06:11 QuLogic

Seems to be a bug in the netcdf library. Simple reproducer:

#include <netcdf.h>

int main(int argc, char **argv) {
   int ncid;
   nc_create("file://foo.zarr#mode=nczarr,noxarray", 0, &ncid);
}

I'm going to see if it is still present in the latest netcdf git, and report upstream if needed.

opoplawski avatar Dec 16 '22 04:12 opoplawski

Building against netcdf with the patch backported by @opoplawski, I now get a single failure:

_________________________ TestNCZarr.test_open_nczarr __________________________
[gw3] linux -- Python 3.11.1 /usr/bin/python3
zarr_obj = <zarr.core.Array '/dim2' (9,) float64 read-only>
dimension_key = '_ARRAY_DIMENSIONS', try_nczarr = True
    def _get_zarr_dims_and_attrs(zarr_obj, dimension_key, try_nczarr):
        # Zarr arrays do not have dimensions. To get around this problem, we add
        # an attribute that specifies the dimension. We have to hide this attribute
        # when we send the attributes to the user.
        # zarr_obj can be either a zarr group or zarr array
        try:
            # Xarray-Zarr
>           dimensions = zarr_obj.attrs[dimension_key]
/builddir/build/BUILDROOT/python-xarray-2022.12.0-1.fc38.x86_64/usr/lib/python3.11/site-packages/xarray/backends/zarr.py:183: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
self = <zarr.attrs.Attributes object at 0x7f697d934ad0>
item = '_ARRAY_DIMENSIONS'
    def __getitem__(self, item):
>       return self.asdict()[item]
E       KeyError: '_ARRAY_DIMENSIONS'
/usr/lib/python3.11/site-packages/zarr/attrs.py:74: KeyError
During handling of the above exception, another exception occurred:
zarr_obj = <zarr.core.Array '/dim2' (9,) float64 read-only>
dimension_key = '_ARRAY_DIMENSIONS', try_nczarr = True
    def _get_zarr_dims_and_attrs(zarr_obj, dimension_key, try_nczarr):
        # Zarr arrays do not have dimensions. To get around this problem, we add
        # an attribute that specifies the dimension. We have to hide this attribute
        # when we send the attributes to the user.
        # zarr_obj can be either a zarr group or zarr array
        try:
            # Xarray-Zarr
            dimensions = zarr_obj.attrs[dimension_key]
        except KeyError as e:
            if not try_nczarr:
                raise KeyError(
                    f"Zarr object is missing the attribute `{dimension_key}`, which is "
                    "required for xarray to determine variable dimensions."
                ) from e
    
            # NCZarr defines dimensions through metadata in .zarray
            zarray_path = os.path.join(zarr_obj.path, ".zarray")
            zarray = json.loads(zarr_obj.store[zarray_path])
            try:
                # NCZarr uses Fully Qualified Names
                dimensions = [
>                   os.path.basename(dim) for dim in zarray["_NCZARR_ARRAY"]["dimrefs"]
                ]
E               KeyError: '_NCZARR_ARRAY'
/builddir/build/BUILDROOT/python-xarray-2022.12.0-1.fc38.x86_64/usr/lib/python3.11/site-packages/xarray/backends/zarr.py:197: KeyError
The above exception was the direct cause of the following exception:
self = <xarray.tests.test_backends.TestNCZarr object at 0x7f697ed62f10>
    def test_open_nczarr(self) -> None:
        with create_tmp_file(suffix=".zarr") as tmp:
            expected = self._create_nczarr(tmp)
>           actual = xr.open_zarr(tmp, consolidated=False)
/builddir/build/BUILDROOT/python-xarray-2022.12.0-1.fc38.x86_64/usr/lib/python3.11/site-packages/xarray/tests/test_backends.py:5741: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/builddir/build/BUILDROOT/python-xarray-2022.12.0-1.fc38.x86_64/usr/lib/python3.11/site-packages/xarray/backends/zarr.py:819: in open_zarr
    ds = open_dataset(
/builddir/build/BUILDROOT/python-xarray-2022.12.0-1.fc38.x86_64/usr/lib/python3.11/site-packages/xarray/backends/api.py:540: in open_dataset
    backend_ds = backend.open_dataset(
/builddir/build/BUILDROOT/python-xarray-2022.12.0-1.fc38.x86_64/usr/lib/python3.11/site-packages/xarray/backends/zarr.py:897: in open_dataset
    ds = store_entrypoint.open_dataset(
/builddir/build/BUILDROOT/python-xarray-2022.12.0-1.fc38.x86_64/usr/lib/python3.11/site-packages/xarray/backends/store.py:28: in open_dataset
    vars, attrs = store.load()
/builddir/build/BUILDROOT/python-xarray-2022.12.0-1.fc38.x86_64/usr/lib/python3.11/site-packages/xarray/backends/common.py:128: in load
    (_decode_variable_name(k), v) for k, v in self.get_variables().items()
/builddir/build/BUILDROOT/python-xarray-2022.12.0-1.fc38.x86_64/usr/lib/python3.11/site-packages/xarray/backends/zarr.py:480: in get_variables
    return FrozenDict(
/builddir/build/BUILDROOT/python-xarray-2022.12.0-1.fc38.x86_64/usr/lib/python3.11/site-packages/xarray/core/utils.py:469: in FrozenDict
    return Frozen(dict(*args, **kwargs))
/builddir/build/BUILDROOT/python-xarray-2022.12.0-1.fc38.x86_64/usr/lib/python3.11/site-packages/xarray/backends/zarr.py:481: in <genexpr>
    (k, self.open_store_variable(k, v)) for k, v in self.zarr_group.arrays()
/builddir/build/BUILDROOT/python-xarray-2022.12.0-1.fc38.x86_64/usr/lib/python3.11/site-packages/xarray/backends/zarr.py:457: in open_store_variable
    dimensions, attributes = _get_zarr_dims_and_attrs(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
zarr_obj = <zarr.core.Array '/dim2' (9,) float64 read-only>
dimension_key = '_ARRAY_DIMENSIONS', try_nczarr = True
    def _get_zarr_dims_and_attrs(zarr_obj, dimension_key, try_nczarr):
        # Zarr arrays do not have dimensions. To get around this problem, we add
        # an attribute that specifies the dimension. We have to hide this attribute
        # when we send the attributes to the user.
        # zarr_obj can be either a zarr group or zarr array
        try:
            # Xarray-Zarr
            dimensions = zarr_obj.attrs[dimension_key]
        except KeyError as e:
            if not try_nczarr:
                raise KeyError(
                    f"Zarr object is missing the attribute `{dimension_key}`, which is "
                    "required for xarray to determine variable dimensions."
                ) from e
    
            # NCZarr defines dimensions through metadata in .zarray
            zarray_path = os.path.join(zarr_obj.path, ".zarray")
            zarray = json.loads(zarr_obj.store[zarray_path])
            try:
                # NCZarr uses Fully Qualified Names
                dimensions = [
                    os.path.basename(dim) for dim in zarray["_NCZARR_ARRAY"]["dimrefs"]
                ]
            except KeyError as e:
>               raise KeyError(
                    f"Zarr object is missing the attribute `{dimension_key}` and the NCZarr metadata, "
                    "which are required for xarray to determine variable dimensions."
                ) from e
E               KeyError: 'Zarr object is missing the attribute `_ARRAY_DIMENSIONS` and the NCZarr metadata, which are required for xarray to determine variable dimensions.'
/builddir/build/BUILDROOT/python-xarray-2022.12.0-1.fc38.x86_64/usr/lib/python3.11/site-packages/xarray/backends/zarr.py:200: KeyError

Does that look like a problem in netcdf or a problem in xarray here?

QuLogic avatar Jan 14 '23 01:01 QuLogic

I think this was fixed by https://github.com/Unidata/netcdf-c/issues/2573

QuLogic avatar May 08 '23 09:05 QuLogic