cubed icon indicating copy to clipboard operation
cubed copied to clipboard

Basic rechunking example

Open norlandrhagen opened this issue 1 year ago • 9 comments

Working my way through understanding cubed / cubed-xarray.

I'm trying to get an example working of modifying the chunking of an Xarray dataset and writing it to Zarr. When I roundtrip the Zarr to and from Xarray, it seems like the chunking structure hasn't changed. Is using the .chunk method on an Xarray dataset with cubed viable or should I be using rechunk primitive?

Roundtrip example using Xarray + dask chunks


import xarray as xr 
from zarr.storage import TempStore

ts = TempStore('air_temp_dask.zarr')

ds = xr.tutorial.open_dataset('air_temperature', chunks={})
rds = ds.chunk({'time':1})
rds.to_zarr(ts, consolidated=True)

rtds = xr.open_zarr(ts, chunks={})
rtds

assert rtds.chunks == rds.chunks

Roundtrip example using Xarray + cubed

from cubed import Spec
import xarray as xr 
from zarr.storage import TempStore

ts = TempStore('air_temp_cubed.zarr')

spec = Spec(work_dir='tmp', allowed_mem='2GB')
ds = xr.tutorial.open_dataset('air_temperature', chunked_array_type='cubed',
     from_array_kwargs={'spec': spec},chunks={})

rds = ds.chunk({'time':1}, chunked_array_type="cubed")

# does compute need to be called?
# rds.compute()

rds.to_zarr(ts, consolidated=True, chunkmanager_store_kwargs={'from_array_kwargs': {'spec': spec} })

rtds = xr.open_zarr(ts, chunked_array_type='cubed',
     from_array_kwargs={'spec': spec},chunks={})
     
# This fails
assert rtds.chunks == rds.chunks

chunked dataset (rds): image

roundtripped dataset (rtds): image

🤞 this is an end-of-day brain implementation issue on my end.

norlandrhagen avatar Aug 05 '24 23:08 norlandrhagen