cubed
cubed copied to clipboard
WIP: Add virtual-rechunk example
Rechunk a virtual dataset
This example demonstrates how to rechunk a collection of necdf files on s3 into a single zarr store.
First, lithops and VirtualiZarr construct a virtual dataset comprised of the netcdf files on s3. Then, xarray-cubed rechunks the virtual dataset into a zarr. Inspired by the Pythia cookbook by @norlandrhagen.
STATUS
I'm pretty sure I got this workflow to work, albeit slowly; however, now I'm getting a new AttributeError. Details below.
PLANNING
Rechunking has been a thorn in the side for many of us, and I think there's general interest in a serverless workflow. It remains to be seen whether this example should live as part of cubed or as part of a pangeo community of practice. Once this example is working again, the next two steps are:
- Increase the chunk size to ~100MB, which might involve finding a better demo dataset. The demo chunks are currently too small, which is not performant.
- Explore how difficult it would be to alter cube's rechunk algorithm such that each work writes multiple chunks, just as
rechunkerdoes.