kerchunk icon indicating copy to clipboard operation
kerchunk copied to clipboard

Error with combining kerchunk mappings with MultiZarrToZarr

Open John-Ragland opened this issue 7 months ago • 2 comments

I have many *.nc files in a directory that I am opening into a single dataset with kerchunk. The nc files are shape (N,1), where dimension of shape 1 is the dimension I want to concat along. Here is a minimally reproducible example:

from kerchunk.hdf import SingleHdf5ToZarr
from kerchunk.combine import MultiZarrToZarr
import xarray as xr
import numpy as np

# create toy data
ds1 = xr.Dataset({'A':xr.DataArray(np.random.rand(1000,1), dims=['x','y'], coords={'x':np.arange(1000),'y':[0]})})
ds2 = xr.Dataset({'A':xr.DataArray(np.random.rand(1000,1), dims=['x','y'], coords={'x':np.arange(1000),'y':[1]})})

ds1.to_netcdf('ds1.nc')
ds2.to_netcdf('ds2.nc')

# create kerchunk mapping
mappings = [SingleHdf5ToZarr('ds1.nc').translate(), SingleHdf5ToZarr('ds2.nc').translate()]

# open dataset with xarray
mzz = MultiZarrToZarr(mappings, concat_dims=['y'], identical_dims=['x']).translate()
so = {'fo': mzz}

ds = xr.open_dataset(
    "reference://", engine="zarr", backend_kwargs={"consolidated": False, "storage_options": so}
).chunk({'y':1})

This works and if I call ds['A'].values, everything is loaded correctly, but if I try to load a single slice in the y dimension (i.e. ds['A'][:,0].values, I get the error:

ValueError: could not broadcast input array from shape (1000,1) into shape (1000,)

If I call ds['A'][:,:1], (preserving the y dimension with shape one), it works as expected.

Here is the contents of the mzz variable:

[{'version': 1,
  'refs': {'.zgroup': '{"zarr_format":2}',
   'A/.zarray': '{"chunks":[1000,1],"compressor":null,"dtype":"<f8","fill_value":"NaN","filters":null,"order":"C","shape":[1000,1],"zarr_format":2}',
   'A/.zattrs': '{"_ARRAY_DIMENSIONS":["x","y"]}',
   'A/0.0': ['ds1.nc', 17256, 8000],
   'x/.zarray': '{"chunks":[1000],"compressor":null,"dtype":"<i8","fill_value":null,"filters":null,"order":"C","shape":[1000],"zarr_format":2}',
   'x/.zattrs': '{"_ARRAY_DIMENSIONS":["x"]}',
   'x/0': ['ds1.nc', 983, 8000],
   'y/.zarray': '{"chunks":[1],"compressor":null,"dtype":"<i8","fill_value":null,"filters":null,"order":"C","shape":[1],"zarr_format":2}',
   'y/.zattrs': '{"_ARRAY_DIMENSIONS":["y"]}',
   'y/0': '\x00\x00\x00\x00\x00\x00\x00\x00'}},
 {'version': 1,
  'refs': {'.zgroup': '{"zarr_format":2}',
   'A/.zarray': '{"chunks":[1000,1],"compressor":null,"dtype":"<f8","fill_value":"NaN","filters":null,"order":"C","shape":[1000,1],"zarr_format":2}',
   'A/.zattrs': '{"_ARRAY_DIMENSIONS":["x","y"]}',
   'A/0.0': ['ds2.nc', 17256, 8000],
   'x/.zarray': '{"chunks":[1000],"compressor":null,"dtype":"<i8","fill_value":null,"filters":null,"order":"C","shape":[1000],"zarr_format":2}',
   'x/.zattrs': '{"_ARRAY_DIMENSIONS":["x"]}',
   'x/0': ['ds2.nc', 983, 8000],
   'y/.zarray': '{"chunks":[1],"compressor":null,"dtype":"<i8","fill_value":null,"filters":null,"order":"C","shape":[1],"zarr_format":2}',
   'y/.zattrs': '{"_ARRAY_DIMENSIONS":["y"]}',
   'y/0': '\x01\x00\x00\x00\x00\x00\x00\x00'}}]

I'm currently running:

  • python: 3.12.4
  • xarray: 2024.6.0
  • kerchunk: 0.2.6
  • zarr: 2.18.2
  • numpy: 2.0.0

Thanks!

John-Ragland avatar Jul 24 '24 21:07 John-Ragland