zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

[v3] remote store support (s3, gcs, azure, http)

Open jhamman opened this issue 1 year ago • 7 comments

The v3 branch currently has an initial (but broken) implementation of a remote store. This issue tracks getting that operational so we car start using v3 against s3, gcs, azure, http, etc.

TODOs:

  • [ ] rename to FsspecStore
  • [ ] limit store to only fsspec implementation that are async-friendly
  • [ ] decide on approach to testing

@martindurant - any chance you can lend a hand on this one? I optimistically put you in as the assignee but let us know if this is not something you can engage on.

Blocked by #1755

jhamman avatar Apr 05 '24 18:04 jhamman

Hi, I'm watching. I can help, but I don't have a huge number of hours, of course.

I notice the "blocked by" - and there has been no progress on that issue yet,

martindurant avatar Apr 10 '24 13:04 martindurant

Personally I don't think we need to wait for a final store API before implementing remote storage. We already implement local storage for v3, so we should just make a remote API that matches it.

Any friction generated in that process can go towards shaping the final API :)

d-v-b avatar Apr 10 '24 14:04 d-v-b

Is the use of universal_pathlib desired/required?

martindurant avatar Apr 10 '24 19:04 martindurant

@martindurant universal_pathlib was carried over from zarrita; maybe @normanrz can weigh in on the pros and cons of keeping it in.

d-v-b avatar Apr 11 '24 07:04 d-v-b

I think upath is quite useful to compose paths. It would be great if the FsspecStore would support opening from a UPath object. Maybe it should be an optional dependency, though.

normanrz avatar Apr 12 '24 08:04 normanrz

OK, I can have it optional. Honestly I doubt we'll see it much, since typical invocation would be like zarr.open("s3://bucket/data.zarr") (which already works).

martindurant avatar Apr 12 '24 13:04 martindurant

UPath becomes interesting if you store some storage_options like this and use it in different places:

path = UPath("s3://bucket", key='some', secret='credentials')
z1 = zarr.open(path / "data.zarr")
z2 = zarr.open(path / "other.zarr")

normanrz avatar Apr 12 '24 13:04 normanrz