cubed icon indicating copy to clipboard operation
cubed copied to clipboard

Computations requiring irregularly-chunked Zarr stores

Open TomNicholas opened this issue 2 years ago • 2 comments

All intermediate results in Cubed are written out to persistent storage via Zarr, but currently Zarr can't represent any chunked array, because the Zarr spec does not yet support irregular chunks.

This comes up in Cubed when a computation changes the chunking of an array from regularly chunked to irregularly chunked. An important example of such a computation is groupby, see https://github.com/tomwhite/cubed/issues/223.

I'm not sure if there are any other examples of array operations that might change regular chunks into irregular ones? np.pad comes to mind?

TomNicholas avatar Sep 20 '23 15:09 TomNicholas

See discussion on ZEP003

TomNicholas avatar Sep 20 '23 16:09 TomNicholas

You could implement pad in Cubed using concatenate, which already exists but has to copy the Zarr arrays to make them regularly-chunked. But if Zarr supported irregular chunking then it could be more efficient.

tomwhite avatar Sep 20 '23 16:09 tomwhite