kerchunk
kerchunk copied to clipboard
Error with combining kerchunk mappings with MultiZarrToZarr
I have many *.nc files in a directory that I am opening into a single dataset with kerchunk. The nc files are shape (N,1), where dimension of shape 1 is the dimension I want to concat along. Here is a minimally reproducible example:
from kerchunk.hdf import SingleHdf5ToZarr
from kerchunk.combine import MultiZarrToZarr
import xarray as xr
import numpy as np
# create toy data
ds1 = xr.Dataset({'A':xr.DataArray(np.random.rand(1000,1), dims=['x','y'], coords={'x':np.arange(1000),'y':[0]})})
ds2 = xr.Dataset({'A':xr.DataArray(np.random.rand(1000,1), dims=['x','y'], coords={'x':np.arange(1000),'y':[1]})})
ds1.to_netcdf('ds1.nc')
ds2.to_netcdf('ds2.nc')
# create kerchunk mapping
mappings = [SingleHdf5ToZarr('ds1.nc').translate(), SingleHdf5ToZarr('ds2.nc').translate()]
# open dataset with xarray
mzz = MultiZarrToZarr(mappings, concat_dims=['y'], identical_dims=['x']).translate()
so = {'fo': mzz}
ds = xr.open_dataset(
"reference://", engine="zarr", backend_kwargs={"consolidated": False, "storage_options": so}
).chunk({'y':1})
This works and if I call ds['A'].values
, everything is loaded correctly, but if I try to load a single slice in the y dimension (i.e. ds['A'][:,0].values
, I get the error:
ValueError: could not broadcast input array from shape (1000,1) into shape (1000,)
If I call ds['A'][:,:1]
, (preserving the y dimension with shape one), it works as expected.
Here is the contents of the mzz variable:
[{'version': 1,
'refs': {'.zgroup': '{"zarr_format":2}',
'A/.zarray': '{"chunks":[1000,1],"compressor":null,"dtype":"<f8","fill_value":"NaN","filters":null,"order":"C","shape":[1000,1],"zarr_format":2}',
'A/.zattrs': '{"_ARRAY_DIMENSIONS":["x","y"]}',
'A/0.0': ['ds1.nc', 17256, 8000],
'x/.zarray': '{"chunks":[1000],"compressor":null,"dtype":"<i8","fill_value":null,"filters":null,"order":"C","shape":[1000],"zarr_format":2}',
'x/.zattrs': '{"_ARRAY_DIMENSIONS":["x"]}',
'x/0': ['ds1.nc', 983, 8000],
'y/.zarray': '{"chunks":[1],"compressor":null,"dtype":"<i8","fill_value":null,"filters":null,"order":"C","shape":[1],"zarr_format":2}',
'y/.zattrs': '{"_ARRAY_DIMENSIONS":["y"]}',
'y/0': '\x00\x00\x00\x00\x00\x00\x00\x00'}},
{'version': 1,
'refs': {'.zgroup': '{"zarr_format":2}',
'A/.zarray': '{"chunks":[1000,1],"compressor":null,"dtype":"<f8","fill_value":"NaN","filters":null,"order":"C","shape":[1000,1],"zarr_format":2}',
'A/.zattrs': '{"_ARRAY_DIMENSIONS":["x","y"]}',
'A/0.0': ['ds2.nc', 17256, 8000],
'x/.zarray': '{"chunks":[1000],"compressor":null,"dtype":"<i8","fill_value":null,"filters":null,"order":"C","shape":[1000],"zarr_format":2}',
'x/.zattrs': '{"_ARRAY_DIMENSIONS":["x"]}',
'x/0': ['ds2.nc', 983, 8000],
'y/.zarray': '{"chunks":[1],"compressor":null,"dtype":"<i8","fill_value":null,"filters":null,"order":"C","shape":[1],"zarr_format":2}',
'y/.zattrs': '{"_ARRAY_DIMENSIONS":["y"]}',
'y/0': '\x01\x00\x00\x00\x00\x00\x00\x00'}}]
I'm currently running:
- python: 3.12.4
- xarray: 2024.6.0
- kerchunk: 0.2.6
- zarr: 2.18.2
- numpy: 2.0.0
Thanks!