cubed icon indicating copy to clipboard operation
cubed copied to clipboard

CLI just for rechunking?

Open TomNicholas opened this issue 2 months ago • 2 comments

Rechunking is such a common workload, and Cubed can crush it. But many people who want to rechunk zarr data don't necessarily want to get into xarray or python. Could we make a small standalone CLI tool that calls cubed.rechunk under the hood? Then users could run it e.g. using uvx. It could live here, or maybe in zarr-python.

The result would be effectively like exposing only the part of cubed that is essentially the original rechunker package.

cc @d-v-b

TomNicholas avatar Oct 17 '25 12:10 TomNicholas

Great idea!

(BTW I've just enabled Discussions in this repo for ideas like this - although maybe this one is fine as an issue.)

tomwhite avatar Oct 20 '25 09:10 tomwhite

A few ideas floating around my head about this:

  • The zarrs_tools CLI has a re-encode command that does more than rechunking. This is probably worth examining (or even wrapping!)
  • We could use the proposed suffix chunk key encoding to do re-chunking or any other chunk transformation in-place, by using a special suffix for intermediate chunks. We could also define a special store class that recognizes suffixed zarr.json documents, so you could essentially have multiple arrays co-existing under the same prefix until the transformation is complete, at which point you would do a cleanup

d-v-b avatar Oct 20 '25 09:10 d-v-b