zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

Basic command line tools -- copy, remove, convert, rechunk

Open d-v-b opened this issue 2 years ago • 4 comments

I think it could be helpful to expose basic hierarchy transformations to users via a CLI. Usage would look something like

zarr cp src dest # efficient array / group copy zarr rm path # efficient array / group removal zarr convert path_to_array --inplace --chunks=10,10,10 --compressor=gzip # efficient in-place rechunking and recompression (this is a reach goal)

We could start with serial (painfully slow) versions of these operations to get the interfaces sorted out, then improve performance as needed.

Once these command exist, we should add instructions to the docs for using pipx to run these commands.

Does anyone else think this is a good idea?

d-v-b avatar Aug 28 '23 15:08 d-v-b

I like this, but I question whether it belongs in zarr-python or a downstream package — which could one day become a sibling package if it e.g. gets rewritten in rust to use the rust zarr implementation for example.

jni avatar Aug 28 '23 19:08 jni

we already have basic store -> store copying functionality in convenience.py, rm is defined on store classes already, so at least for these two the only big change would writing CLIs for this functionality.

d-v-b avatar Aug 28 '23 19:08 d-v-b

Great minds :) https://github.com/joshmoore/zarr-utils/tree/master/zarr_utils

Knowing that the convenience functions needed work (https://github.com/joshmoore/zarr-utils/issues), I was actually trying to find a new home for them. So:

  • :+1: for the idea.
  • :+1: for a separate package (a la h5utils, etc. Conceivably this package could fall back to a native implementation if available)
  • :+1: for more testing/improvements on the copying functionality.

joshmoore avatar Aug 28 '23 19:08 joshmoore

I also noticed the convenience functions could be faster. You may want to look into adding some concurrency there, especially for remote stores. We usually copy using gsutil to do it faster most of the time due to this. We haven't benchmarked the local disk, but it won't likely be an issue there.

tasansal avatar Sep 19 '23 02:09 tasansal