zarr-ZipStore problems: no filename support, duplicate metadata on ZipStore-write, can't read from ZipStore via filename
What happened?
Recent versions of xarray have several regressions when dealing with zarrs by way of ZipStores. First, there is no longer transparent creation of a ZipStore-based zarr on ds.to_zarr():
In [1]: import xarray as xr; import zarr; import numpy as np
In [2]: xr.__version__
Out[2]: '2025.10.0'
In [3]: ds = xr.Dataset(data_vars={'foo' : np.arange(3)})
In [4]: out_store = ds.to_zarr('./in_2024_01_is_a.zarr.zip',compute=True,zarr_format=2,consolidated=True)
In [5]: out_store.close()
In [7]: !ls -aR ./in_2024_01_is_a.zarr.zip
./in_2024_01_is_a.zarr.zip:
. .. foo .zattrs .zgroup .zmetadata
./in_2024_01_is_a.zarr.zip/foo:
. .. 0 .zarray .zattrs
Second, when forcing use of a Zip output via zarr.storage.ZipStore, the to_zarr module writes the metadata files several times, leading to duplicate zip entries even without consolidated metadata (it's also present with):
In [1]: import xarray as xr; import zarr; import numpy as np
In [2]: ds = xr.Dataset(data_vars={'foo' : np.arange(3)})
In [3]: zipstore = zarr.storage.ZipStore('zipstore.zarr.zip',mode='w',read_only=False)
In [5]: out_store = ds.to_zarr(zipstore,compute=True,zarr_format=2,consolidated=False)
/home/csu001/data/ppp5/conda_env/nnja/lib/python3.12/zipfile/__init__.py:1611: UserWarning: Duplicate name: '.zgroup'
return self._open_to_write(zinfo, force_zip64=force_zip64)
/home/csu001/data/ppp5/conda_env/nnja/lib/python3.12/zipfile/__init__.py:1611: UserWarning: Duplicate name: '.zattrs'
return self._open_to_write(zinfo, force_zip64=force_zip64)
/home/csu001/data/ppp5/conda_env/nnja/lib/python3.12/zipfile/__init__.py:1611: UserWarning: Duplicate name: 'foo/.zarray'
return self._open_to_write(zinfo, force_zip64=force_zip64)
/home/csu001/data/ppp5/conda_env/nnja/lib/python3.12/zipfile/__init__.py:1611: UserWarning: Duplicate name: 'foo/.zattrs'
return self._open_to_write(zinfo, force_zip64=force_zip64)
In [6]: out_store.close()
In [8]: zipstore.close()
In [9]: !unzip -l zipstore.zarr.zip
Archive: zipstore.zarr.zip
Length Date Time Name
--------- ---------- ----- ----
22 10-07-2025 11:28 .zgroup
2 10-07-2025 11:28 .zattrs
22 10-07-2025 11:28 .zgroup
2 10-07-2025 11:28 .zattrs
292 10-07-2025 11:28 foo/.zarray
2 10-07-2025 11:28 foo/.zattrs
292 10-07-2025 11:28 foo/.zarray
42 10-07-2025 11:28 foo/.zattrs
40 10-07-2025 11:28 foo/0
--------- -------
716 9 files
Finally, once created a ZipStore can no longer be transparently read from load_dataset or open_zarr by path; it has to go via intermediate ZipStore.
In [1]: import xarray as xr; import zarr; import numpy as np
In [2]: ds = xr.load_dataset('./zipstore.zarr.zip',engine='zarr',zarr_format=2) # similar error with open_zarr
---------------------------------------------------------------------------
FileExistsError Traceback (most recent call last)
Cell In[2], line 1
----> 1 ds = xr.load_dataset('./zipstore.zarr.zip',engine='zarr',zarr_format=2)
[...]
FileExistsError: [Errno 17] File exists: '/fs/site6/eccc/mrd/rpnatm/csu001/ppp6/nnja/zipstore.zarr.zip'
All of this was apparently working in 2024.0.1, and I suspect the regressions happened around the time of Zarr3 adoption.
What did you expect to happen?
No response
Minimal Complete Verifiable Example
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "xarray[complete]@git+https://github.com/pydata/xarray.git@main",
# ]
# ///
#
# This script automatically imports the development branch of xarray to check for issues.
# Please delete this header if you have _not_ tested this script with `uv run`!
import xarray as xr
xr.show_versions()
# your reproducer code ...
import numpy as np
import zarr
import os
import warnings
ds = xr.Dataset(data_vars = {'foo' : (('dim1',), np.arange(10))})
# Reproduction 1: .zarr.zip filename creates a directory
out_store = ds.to_zarr('test.zarr.zip',zarr_format=2)
out_store.close()
assert(not os.path.isdir('./test.zarr.zip'))
# Reproduction 2: duplicate metadata entries when writing with ZipStore
zipstore = zarr.storage.ZipStore('zipstore.zarr.zip',mode='w',read_only=False)
with warnings.catch_warnings():
warnings.simplefilter(action='error',category=UserWarning)
out_store = ds.to_zarr(zipstore,zarr_format=2,consolidated=False)
out_store.close()
zipstore.close()
Steps to reproduce
No response
MVCE confirmation
- [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [x] Complete example — the example is self-contained, including all data and the text of any traceback.
- [x] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- [x] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [x] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None python: 3.12.11 | packaged by conda-forge | (main, Jun 4 2025, 14:45:31) [GCC 13.3.0] python-bits: 64 OS: Linux OS-release: 4.18.0-240.el8.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None
xarray: 2025.10.0 pandas: 2.3.2 numpy: 2.2.6 scipy: 1.16.1 netCDF4: None pydap: None h5netcdf: None h5py: None zarr: 3.1.2 cftime: None nc_time_axis: None iris: None bottleneck: None dask: 2025.7.0 distributed: None matplotlib: 3.10.6 cartopy: None seaborn: None numbagg: None fsspec: 2025.9.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 80.9.0 pip: 25.2 conda: None pytest: None mypy: None IPython: 9.5.0 sphinx: None
As an addendum, reading a zipstore via URL-style spec does work (xr.open_zarr('zip::test.zarr.zip')), but writing a zipstore with that format fails. This might be an upstream bug in zarr.