xarray
xarray copied to clipboard
Error converting zarr v3 to zarr v2: zarr 2 does not support "Serializer"
What happened?
I'm trying to convert a Zarr dataset in the Zarr format/version 3 to 2. Xarray can successfully read zarr v2 or v3. I can make an xarray dataset in memory and write to disc in either zarr v2 or v3. I seem to find this problem only when reading an existing zarr v3 and and attempting to write to a zarr v2. However i can successfully convert the opposite direction by reading an existing zarr v2 and writing to zarr v3. Given that all the other read/write scenarios that i've tried work fine, i'm guessing there's some internal config set when reading a dataset from zarr v3 which maybe doesn't exist or has an incompatible default for in zarr v2.
What did you expect to happen?
I expected this to be possible
xr.open_zarr('v3path.zarr', zarr_format=3).to_zarr('v2path.zarr', zarr_format=2)
Minimal Complete Verifiable Example
import xarray as xr
import numpy as np
# generate a random array and write it to zarr v3
arr = np.random.rand(100, 100)
(
xr
.DataArray(arr, dims=['x', 'y'], name='data')
.to_zarr(
'./test-v3.zarr',
zarr_format=3
)
)
# now try to convert it to a zarr 2
(
xr
.open_zarr('./test-v3.zarr')
.to_zarr('test-v2.zarr', zarr_format=2)
)
MVCE confirmation
- [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [x] Complete example — the example is self-contained, including all data and the text of any traceback.
- [x] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- [x] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [x] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
ValueError: Zarr format 2 arrays do not support `serializer`.
Anything else we need to know?
No response
Environment
This is on a google colab machine but i've also replicated this on so Mac and Windows installations
xarray: 2025.3.1 pandas: 2.2.2 numpy: 2.0.2 scipy: 1.15.3 netCDF4: None pydap: None h5netcdf: 1.6.1 h5py: 3.13.0 zarr: None cftime: None nc_time_axis: None iris: None bottleneck: 1.4.2 dask: 2024.12.1 distributed: 2024.12.1 matplotlib: 3.10.0 cartopy: None seaborn: 0.13.2 numbagg: None fsspec: 2025.3.2 cupy: 13.3.0 pint: None sparse: None flox: None numpy_groupies: None setuptools: 75.2.0 pip: 24.1.2 conda: None pytest: 8.3.5 mypy: None IPython: 7.34.0 sphinx: 8.2.3
I'm happy to make a pull request but I need explanation of what serializer is/does. The "does not support" language in the error makes me unsure if this is intended behavior or not. Either way, i feel that it should be possible to make this conversion.
Possibly related issue is #9987
I've also been running into this.
If it helps troubleshoot, loading the dataset into memory (using ds.load()) and then reconverting it to a dask array does not work as a workaround (ds.load().chunk(...).to_zarr(fn,zarr_format=2) results in the same error for me).
The one workaround I've found is saving it as a netcdf, and then reloading, and then saving as a zarr. Of course not ideal, since it kinda defeats the purpose of the exercise.
can you look at the encoding of your variables ({n: v.encoding for n, v in ds.variables.items()})? I'd bet that's where this error is coming from, and if you'd clear the encoding (ds.drop_encoding()) or applied a function that translates zarr v3 encoding to zarr v2 encoding you'd avoid the error.