cubed
cubed copied to clipboard
CLI just for rechunking?
Rechunking is such a common workload, and Cubed can crush it. But many people who want to rechunk zarr data don't necessarily want to get into xarray or python. Could we make a small standalone CLI tool that calls cubed.rechunk under the hood? Then users could run it e.g. using uvx. It could live here, or maybe in zarr-python.
The result would be effectively like exposing only the part of cubed that is essentially the original rechunker package.
cc @d-v-b
Great idea!
(BTW I've just enabled Discussions in this repo for ideas like this - although maybe this one is fine as an issue.)
A few ideas floating around my head about this:
- The
zarrs_toolsCLI has a re-encode command that does more than rechunking. This is probably worth examining (or even wrapping!) - We could use the proposed
suffixchunk key encoding to do re-chunking or any other chunk transformation in-place, by using a special suffix for intermediate chunks. We could also define a special store class that recognizes suffixedzarr.jsondocuments, so you could essentially have multiple arrays co-existing under the same prefix until the transformation is complete, at which point you would do a cleanup