zarrs_tools icon indicating copy to clipboard operation
zarrs_tools copied to clipboard

RFE: Object storage support

Open joshmoore opened this issue 1 year ago • 2 comments

Have you e.g. considered reading/writing from/to S3?

In the resave.py script we are working on for the challenge, options like these:

time ./resave.py \
        zarr/v0.4/idr0001A/2551.zarr \
        --input-bucket=idr \
        --input-endpoint=https://uk1s3.embassy.ebi.ac.uk \
        --input-anon \
        ...

prevent the need to download the data locally.

I'm currently working on using zarrs_reencode but generating a script:

./resave.py zarr/v0.4/idr0001A/2551.zarr --output-script ...

which produces a script per Zarr array of the form:

zarrs_reencode --chunk-shape 1,1,1040,1376 --shard-shape 2,16,1040,1376 --dimension-names c,z,y,x --validate \
    zarr/v0.4/idr0001A/2551.zarr/C/3/0 OUTPUT/C/3/0

but this of course won't work when the source or target are on S3.

joshmoore avatar Jul 30 '24 14:07 joshmoore

I've added read support for HTTP stores in zarrs_tools version 0.5.5 with #11.

zarrs_reencode https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0001A/2551.zarr/C/3/0/0 2551.zarr/C/3/0/0

Writing to remote stores is not supported, and I am not sure it is worth adding support given the complexity of supporting many different services + auth. Eventually zarrs itself will have a Python wrapper for more flexible usage.

P.S. That location reports missing chunks as permission denied when interpreted as an S3 endpoint. I'm not sure if that is standard. HTTP is fine.

LDeakin avatar Jul 31 '24 01:07 LDeakin

I've added read support for HTTP stores in zarrs_tools version 0.5.5 with https://github.com/LDeakin/zarrs_tools/pull/11.

🤯 Amazing. I'll give it a try ASAP.

Writing to remote stores is not supported, and I am not sure it is worth adding support given the complexity of supporting many different services + auth.

Understood. Certainly one tricky aspect of all of this.

That location reports missing chunks as permission denied when interpreted as an S3 endpoint. I'm not sure if that is standard.

Heh. Since there's not really a standard, I agree. :) It's definitely my experience that each provider of "S3" has a slightly different take.

HTTP is fine.

:+1:

joshmoore avatar Jul 31 '24 06:07 joshmoore