Issue with xarray and zarr when dimensions start with "meta"
What happened?
Currently I'm working with xarray for MRSI analysis. I have been using xarray datasets with one the dimensions labeled as "metabolite." Previously I have been able to save and load the data to zarr with no issues using xr.Dataset.to_zarr and xr.open_zarr.
Currently I'm getting an issue where I get an error raised by zarr that complains about this dimension starting with "meta." I think this may be due to a new version of zarr. I have copied the error below.
When I changed the dimension name (i.e. the zarr subfolder and .zmetadata file) from "metabolite" to something that doesn't start with "meta" then I can load the data properly.
Someone may need to modify the xr.Dataset.to_zarr and xr.open_zarr functions in case an xarray user decides to create a dimension that starts with "meta" and wants to save their dataset to zarr.
Please let me know if you need to send any additional info.
What did you expect to happen?
I expected to load an xarray dataset saved by xr.Dataset.to_zarr() using xr.open_zarr(). However, zarr through an error because one of the dimensions (metabolite) started with "meta".
Minimal Complete Verifiable Example
No response
MVCE confirmation
- [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
- [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.
Relevant log output
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Input In [5], in <cell line: 1>()
----> 1 xr.open_zarr("/home/nonroot/data1/Data/MIDAS/U01_Midas/QINU01EM004/nnfit/09_10_2014.data3D.zarr/")
File ~/.pyenv/versions/3.10.5/lib/python3.10/site-packages/xarray/backends/zarr.py:789, in open_zarr(store, group, synchronizer, chunks, decode_cf, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, consolidated, overwrite_encoded_chunks, chunk_store, storage_options, decode_timedelta, use_cftime, **kwargs)
776 raise TypeError(
777 "open_zarr() got unexpected keyword arguments " + ",".join(kwargs.keys())
778 )
780 backend_kwargs = {
781 "synchronizer": synchronizer,
782 "consolidated": consolidated,
(...)
786 "stacklevel": 4,
787 }
--> 789 ds = open_dataset(
790 filename_or_obj=store,
791 group=group,
792 decode_cf=decode_cf,
793 mask_and_scale=mask_and_scale,
794 decode_times=decode_times,
795 concat_characters=concat_characters,
796 decode_coords=decode_coords,
797 engine="zarr",
798 chunks=chunks,
799 drop_variables=drop_variables,
800 backend_kwargs=backend_kwargs,
801 decode_timedelta=decode_timedelta,
802 use_cftime=use_cftime,
803 )
804 return ds
File ~/.pyenv/versions/3.10.5/lib/python3.10/site-packages/xarray/backends/api.py:531, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, backend_kwargs, **kwargs)
519 decoders = _resolve_decoders_kwargs(
520 decode_cf,
521 open_backend_dataset_parameters=backend.open_dataset_parameters,
(...)
527 decode_coords=decode_coords,
528 )
530 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 531 backend_ds = backend.open_dataset(
532 filename_or_obj,
533 drop_variables=drop_variables,
534 **decoders,
535 **kwargs,
536 )
537 ds = _dataset_from_backend_dataset(
538 backend_ds,
539 filename_or_obj,
(...)
547 **kwargs,
548 )
549 return ds
File ~/.pyenv/versions/3.10.5/lib/python3.10/site-packages/xarray/backends/zarr.py:851, in ZarrBackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, synchronizer, consolidated, chunk_store, storage_options, stacklevel)
849 store_entrypoint = StoreBackendEntrypoint()
850 with close_on_error(store):
--> 851 ds = store_entrypoint.open_dataset(
852 store,
853 mask_and_scale=mask_and_scale,
854 decode_times=decode_times,
855 concat_characters=concat_characters,
856 decode_coords=decode_coords,
857 drop_variables=drop_variables,
858 use_cftime=use_cftime,
859 decode_timedelta=decode_timedelta,
860 )
861 return ds
File ~/.pyenv/versions/3.10.5/lib/python3.10/site-packages/xarray/backends/store.py:26, in StoreBackendEntrypoint.open_dataset(self, store, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta)
14 def open_dataset(
15 self,
16 store,
(...)
24 decode_timedelta=None,
25 ):
---> 26 vars, attrs = store.load()
27 encoding = store.get_encoding()
29 vars, attrs, coord_names = conventions.decode_cf_variables(
30 vars,
31 attrs,
(...)
38 decode_timedelta=decode_timedelta,
39 )
File ~/.pyenv/versions/3.10.5/lib/python3.10/site-packages/xarray/backends/common.py:125, in AbstractDataStore.load(self)
103 def load(self):
104 """
105 This loads the variables and attributes simultaneously.
106 A centralized loading function makes it easier to create
(...)
122 are requested, so care should be taken to make sure its fast.
123 """
124 variables = FrozenDict(
--> 125 (_decode_variable_name(k), v) for k, v in self.get_variables().items()
126 )
127 attributes = FrozenDict(self.get_attrs())
128 return variables, attributes
File ~/.pyenv/versions/3.10.5/lib/python3.10/site-packages/xarray/backends/zarr.py:461, in ZarrStore.get_variables(self)
460 def get_variables(self):
--> 461 return FrozenDict(
462 (k, self.open_store_variable(k, v)) for k, v in self.zarr_group.arrays()
463 )
File ~/.pyenv/versions/3.10.5/lib/python3.10/site-packages/xarray/core/utils.py:474, in FrozenDict(*args, **kwargs)
473 def FrozenDict(*args, **kwargs) -> Frozen:
--> 474 return Frozen(dict(*args, **kwargs))
File ~/.pyenv/versions/3.10.5/lib/python3.10/site-packages/xarray/backends/zarr.py:461, in <genexpr>(.0)
460 def get_variables(self):
--> 461 return FrozenDict(
462 (k, self.open_store_variable(k, v)) for k, v in self.zarr_group.arrays()
463 )
File ~/.pyenv/versions/3.10.5/lib/python3.10/site-packages/zarr/hierarchy.py:603, in Group._array_iter(self, keys_only, method, recurse)
601 for key in sorted(listdir(self._store, self._path)):
602 path = self._key_prefix + key
--> 603 assert not path.startswith("meta")
604 if contains_array(self._store, path):
605 _key = key.rstrip("/")
AssertionError:
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None python: 3.10.5 (main, Jul 31 2022, 18:17:20) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.15.0-43-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: None
xarray: 2022.6.0 pandas: 1.4.3 numpy: 1.23.1 scipy: 1.9.0 netCDF4: None pydap: None h5netcdf: None h5py: 3.7.0 Nio: None zarr: 2.12.0 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.12.0 distributed: 2021.12.0 matplotlib: 3.5.2 cartopy: None seaborn: 0.11.2 numbagg: None fsspec: 2022.7.1 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 58.1.0 pip: 22.2.1 conda: None pytest: 6.2.5 IPython: 8.4.0 sphinx: None
Hi @zndr27, thanks for raising this issue.
It should be possible to create a zarr store directly using zarr-python, make sure it has a dimension name containing "meta", save it, and open it also using zarr directly. I suspect that if you try that you might find that this is a bug in zarr rather than in xarray.
It would be really helpful if you could try doing that, and either raise an issue on the zarr issue tracker if the bug still exists, or comment again here to say that you still think the problem is with xarray's code.