Basic command line tools -- copy, remove, convert, rechunk
I think it could be helpful to expose basic hierarchy transformations to users via a CLI. Usage would look something like
zarr cp src dest # efficient array / group copy
zarr rm path # efficient array / group removal
zarr convert path_to_array --inplace --chunks=10,10,10 --compressor=gzip # efficient in-place rechunking and recompression (this is a reach goal)
We could start with serial (painfully slow) versions of these operations to get the interfaces sorted out, then improve performance as needed.
Once these command exist, we should add instructions to the docs for using pipx to run these commands.
Does anyone else think this is a good idea?
I like this, but I question whether it belongs in zarr-python or a downstream package — which could one day become a sibling package if it e.g. gets rewritten in rust to use the rust zarr implementation for example.
we already have basic store -> store copying functionality in convenience.py, rm is defined on store classes already, so at least for these two the only big change would writing CLIs for this functionality.
Great minds :) https://github.com/joshmoore/zarr-utils/tree/master/zarr_utils
Knowing that the convenience functions needed work (https://github.com/joshmoore/zarr-utils/issues), I was actually trying to find a new home for them. So:
- :+1: for the idea.
- :+1: for a separate package (a la h5utils, etc. Conceivably this package could fall back to a native implementation if available)
- :+1: for more testing/improvements on the copying functionality.
I also noticed the convenience functions could be faster. You may want to look into adding some concurrency there, especially for remote stores. We usually copy using gsutil to do it faster most of the time due to this. We haven't benchmarked the local disk, but it won't likely be an issue there.