xarray icon indicating copy to clipboard operation
xarray copied to clipboard

to_netcdf broken encoding: dtype='S1' + chunksizes

Open crusaderky opened this issue 7 years ago • 2 comments

xarray.Dataset({'x': ['foo', 'bar', 'baz']}).to_netcdf(
    'foo.nc', engine='h5netcdf',
    encoding={'x': {'dtype': 'S1', 'zlib': True, 'chunksizes': (2, )}})

ValueError: "chunks" must have same rank as dataset shape

Same with engine='netcdf4'. The issue is present in 0.10.6 as well as in 0.10.3. The problem is obviously that dtype=S1 changes the shape of the variable before passing it to the backend, but while doing so doesn't also change an eventual chunksizes setting.

The workaround is to omit chunksizes or set it to True.

crusaderky avatar Jun 07 '18 23:06 crusaderky

It looks like this version works:

xarray.Dataset({'x': ['foo', 'bar', 'baz']}).to_netcdf(
    'foo.nc', engine='h5netcdf',
    encoding={'x': {'dtype': 'S1', 'zlib': True, 'chunksizes': (2, 3)}})

I suppose we could update chunksizes to accept both versions? Or just clearly document this behavior?

shoyer avatar Jun 08 '18 01:06 shoyer

IMHO the trick that alters the shape of the array is strictly an implementation detail which should not be exposed to the end user. If the implementation of xarray alters the shape of the variable, it should as well alter anything that relies on it. So I think that chunksizes=(2, 3) should not be accepted as a valid input.

crusaderky avatar Jun 14 '18 13:06 crusaderky

As part of keeping our issue count <1000, closing as unlikely to inspire change, please reopen if anyone disagrees

max-sixty avatar Aug 28 '24 18:08 max-sixty