xarray icon indicating copy to clipboard operation
xarray copied to clipboard

Inconsistent chunking between `xr.open_zarr` and `xr.open_dataset(..., engine='zarr')` with `chunks="auto"`

Open norlandrhagen opened this issue 1 week ago • 5 comments

What happened?

Hi there 👋

I was chatting with @jsignell today about chunk sizes and we came across some potentially inconsistent chunking behavior between xr.open_zarr vs xr.open_dataset(..., engine='zarr'), which I assumed would be identical.

# xr.__version__ : '2025.9.1'
# zarr.__version__ : '3.1.3'
import xarray as xr 

ds = xr.tutorial.open_dataset('air_temperature', chunks={})
ds_rechunked = ds.chunk({'time':100,'lat':25, 'lon':53})
ds_rechunked.to_zarr('air_temperature.zarr', consolidated=False, zarr_format=3)

ds1 = xr.open_zarr('air_temperature.zarr', consolidated=False,chunks="auto")
ds1.chunks

# Frozen({'time': (100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, # 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 20), 'lat': (25,), 'lon': (53,)})

ds2 = xr.open_dataset('air_temperature.zarr', consolidated=False,chunks="auto")
ds2.chunks

# Frozen({'time': (2920,), 'lat': (25,), 'lon': (53,)})


from xarray.testing import assert_chunks_equal

assert_chunks_equal(ds1, ds2)

# AssertionError: 

What did you expect to happen?

The same chunking behavior between xr.open_zarr(...) vs xr.open_dataset(...,engine='zarr')

norlandrhagen avatar Dec 11 '25 02:12 norlandrhagen

I looked at the code and it is pretty obvious why this is happening:

https://github.com/pydata/xarray/blob/f25211928e50063b34e00f04cc9ff6d1468ea486/xarray/backends/zarr.py#L1561-L1567

I think if we just delete that one line: https://github.com/pydata/xarray/blob/f25211928e50063b34e00f04cc9ff6d1468ea486/xarray/backends/zarr.py#L1567 the behavior will match. But that'll change the chunksize by default for this function, so I'm pretty wary of deleting that line!

Ideally I think there would be a version of "auto" that never splits any chunks. If that existed I think it would be the ideal way of opening a Zarr and xarray wouldn't have to worry about: https://github.com/pydata/xarray/blob/00d18bfd0cb6e3b3b54345e66acf98b35b8ec127/xarray/namedarray/utils.py#L247-L263

jsignell avatar Dec 12 '25 15:12 jsignell

Ideally I think there would be a version of "auto" that never splits any chunks

I think this is partly why Xarray normalizes "auto"-chunks.

I agree with deleting that line ;)

dcherian avatar Dec 12 '25 15:12 dcherian

see also https://github.com/pydata/xarray/issues/7495#issuecomment-1852184846

keewis avatar Dec 12 '25 15:12 keewis

Yes exactly @keewis! Do you still think that the special handling for "auto" that exists in open_zarr should be pushed down into open_dataset or are you good with the idea of just deleting the chunks = {} override?

jsignell avatar Dec 12 '25 17:12 jsignell

I personally don't even use open_zarr, so removing the only difference between the two sounds like a win to me.

However, what I meant two years ago was that open_zarr would default to

xr.open_zarr(path, chunks="native")  # or similar
# open_dataset translates that to `chunks = {}`

which would then be different from

xr.open_zarr(path, chunks="auto")  # `chunks="auto"` is handled by the chunk manager

Not sure if introducing a new string alias is a good idea, but given that dict default values are frowned upon (and the standard None is already taken), it might still be helpful. That is, until frozendict arrives in python, should PEP814 be accepted.

keewis avatar Dec 12 '25 20:12 keewis