to_zarr with regions does not respect dim names -- only order.
What happened?
If i have 2 datasets with the same set of coords but in different orders
then to_zarr writes them to a file in that order, rather than ensuring that the names agree.
So reading them out things get swapped around.
Oerhaps I am wrong and actually I am explictly opting out of this by using slice(None, None)?
Its unclear to me the difference between slice(None, None) and auto
possibly the docs should be more clear about it being your resposibility to check coords are in same order when passing regions unless you use auto?
What did you expect to happen?
I expect that writing to the next position in a zarr via to_zarr and passing the next region is just like concatenating on that dimension.
Or at least would error if coords did not agree.
Minimal Complete Verifiable Example
import xarray as xr
xr.show_versions()
# your reproducer code ...
import pandas as pd
import xarray as xr
x = xr.DataArray(pd.Series({"a": 1, "b": 10}), dims=("foo",))
y = xr.DataArray(pd.Series({"b": 20, "a": 2}), dims=("foo",))
z = xr.concat([x,y], dim="bar", join="inner")
expected = z.sel({"foo":"b"})
##############
# combining via regions:
z0 = 0*z
xr.Dataset({"dat": z0}).to_zarr("~/temp/4.zarr")
xr.open_zarr("~/temp/4.zarr").sel({"foo": "b"}).compute()
xb = xr.concat([x], dim="bar")
xr.Dataset({"dat": xb}).to_zarr("~/temp/4.zarr",
region={"foo": slice(None, None), "bar": slice(0, 1)},
#region={"foo": "auto", "bar": slice(0, 1)},
)
yb = xr.concat([y], dim="bar")
xr.Dataset({"dat": yb}).to_zarr("~/temp/4.zarr",
region={"foo": slice(None, None), "bar": slice(1, 2)},
#region={"foo": "auto", "bar": slice(1, 2)},
)
actual = xr.open_zarr("~/temp/4.zarr").sel({"foo": "b"}).compute().dat
assert all(expected == actual)
Displaying whole arrays:
In [12]: z # full expected
Out[12]:
<xarray.DataArray (bar: 2, foo: 2)> Size: 32B
array([[ 1, 10],
[ 2, 20]])
Coordinates:
* foo (foo) object 16B 'a' 'b'
Dimensions without coordinates: bar
In [13]: xr.open_zarr("~/temp/4.zarr").dat.compute() # full actual
Out[13]:
<xarray.DataArray 'dat' (bar: 2, foo: 2)> Size: 32B
array([[ 1, 10],
[20, 2]])
Coordinates:
* foo (foo) object 16B 'a' 'b'
Dimensions without coordinates: bar
you can see they were swapped
Steps to reproduce
- Create a dataarray with some coords
- Create a second data array with the same coords but in a different order
- Create a empty zarr file with same dim and coords as the first of those, plus an extra dim for concatenating along
- Add the concatenation dimension to the first, and write it to the zarr to it passing region as
slice(None, None)for the dimensions in common, andslice(0, 1)for the concatenation dimension - Similar for the second, but pass
slice(0, 1)for the concatenation dimension - read out the zarr again and see that it is swapped
MVCE confirmation
- [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [x] Complete example — the example is self-contained, including all data and the text of any traceback.
- [x] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- [x] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [x] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
/home/frames/repos/biohaus/.pixi/envs/dev/lib/python3.12/site-packages/zarr/api/asynchronous.py:244: ZarrUserWarning: Consolidated metadata is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
warnings.warn(
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[2], line 47
40 xr.Dataset({"dat": yb}).to_zarr("~/temp/4.zarr",
41 region={"foo": slice(None, None), "bar": slice(1, 2)},
42 #region={"foo": "auto", "bar": slice(1, 2)},
43 )
45 actual = xr.open_zarr("~/temp/4.zarr").sel({"foo": "b"}).compute().dat
---> 47 assert all(expected == actual)
AssertionError:
Anything else we need to know?
If rather than passing slice(None,None) for the known dimension one passed auto
then one does get a good error message:
ValueError: The auto-detected region of coordinate 'foo' for writing new data to the original store had non-contiguous indices. Writing to a zarr region slice requires that the new data constitute a contiguous subset of the original store.
I think this message should always be shown even if the user doesn't use auto.
Or it should like concat expose different options for how to combine/join
https://docs.xarray.dev/en/latest/generated/xarray.concat.html
Environment
INSTALLED VERSIONS
commit: None python: 3.12.12 | packaged by conda-forge | (main, Oct 22 2025, 23:25:55) [GCC 14.3.0] python-bits: 64 OS: Linux OS-release: 6.8.0-62-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_AU.UTF-8 LOCALE: ('en_AU', 'UTF-8') libhdf5: None libnetcdf: None
xarray: 2025.10.1 pandas: 2.3.3 numpy: 2.3.4 scipy: 1.16.2 netCDF4: None pydap: None h5netcdf: None h5py: None zarr: 3.1.3 cftime: None nc_time_axis: None iris: None bottleneck: None dask: 2024.12.1 distributed: 2024.12.1 matplotlib: 3.10.7 cartopy: None seaborn: None numbagg: None fsspec: 2025.9.0 cupy: None pint: 0.24.4 sparse: 0.15.5 flox: None numpy_groupies: None setuptools: 80.9.0 pip: 24.3.1 conda: None pytest: 8.4.2 mypy: 1.18.2 IPython: 9.6.0 sphinx: None
Hmm I see what you mean. It seems to me that in order to determine whether the order of the coordinates matches those that have already been written we'd have to read the existing values from the Zarr store. That is what happens when you use auto. It doesn't seem great performance-wise to let that kind of reading creep out into other parts of the code. So I think you are right in your suggestion that this is a chance for the documentation to be improved to make it clear what the expectations are when doing this kind of write operation.