to_netcdf broken encoding: dtype='S1' + chunksizes
xarray.Dataset({'x': ['foo', 'bar', 'baz']}).to_netcdf(
'foo.nc', engine='h5netcdf',
encoding={'x': {'dtype': 'S1', 'zlib': True, 'chunksizes': (2, )}})
ValueError: "chunks" must have same rank as dataset shape
Same with engine='netcdf4'. The issue is present in 0.10.6 as well as in 0.10.3.
The problem is obviously that dtype=S1 changes the shape of the variable before passing it to the backend, but while doing so doesn't also change an eventual chunksizes setting.
The workaround is to omit chunksizes or set it to True.
It looks like this version works:
xarray.Dataset({'x': ['foo', 'bar', 'baz']}).to_netcdf(
'foo.nc', engine='h5netcdf',
encoding={'x': {'dtype': 'S1', 'zlib': True, 'chunksizes': (2, 3)}})
I suppose we could update chunksizes to accept both versions? Or just clearly document this behavior?
IMHO the trick that alters the shape of the array is strictly an implementation detail which should not be exposed to the end user. If the implementation of xarray alters the shape of the variable, it should as well alter anything that relies on it. So I think that chunksizes=(2, 3) should not be accepted as a valid input.
As part of keeping our issue count <1000, closing as unlikely to inspire change, please reopen if anyone disagrees