Add Zarr compatibility functions
This is a continuation of https://github.com/fsspec/kerchunk/pull/475 because I identified that there are several more behavior changes in zarr v3. Going forward, the utils.py file will abstract away the differences between ZarrV2 and ZarrV3 depending on what the user has installed.
Note that @jhamman's PR https://github.com/fsspec/kerchunk/pull/292 may now be outdated, though I'll merge those commits into this PR if its desired to expose a zarr_version to kerchunk callers.
Thank you for putting this together, you have clearly done a careful job (and thanks to @jhamman for the earlier attempt). This will take some time to go through.
I would encourage you to also look at the implementation in fsspec.implementations.reference, where I expect there will be more zarr2-specific assumptions, particularly in the parquet implementation.
Following @TomAugspurger 's comments, it may well make sense to have None mean V2 for now, whichever zarr version is installed
Thank you for putting this together, you have clearly done a careful job (and thanks to @jhamman for the earlier attempt). This will take some time to go through.
I would encourage you to also look at the implementation in fsspec.implementations.reference, where I expect there will be more zarr2-specific assumptions, particularly in the parquet implementation.
In fsspec I found that fsspec.mapping.FSMap or LazyReferenceMapper objects are passed to zarr.open and zarr.group calls. In ZarrV2, that worked because both of those types are MutableMappings and ZarrV2 knew how to work with MutableMapping stores.
However in ZarrV3, you need to pass a StoreLike object to any of the functions that previously expected MutableMappings. That means fsspec isn't compatible with zarrv3 either :) I'm not sure if these compatibility functions should also be defined at the fsspec layer or if zarr should provide a way to turn MutableMappings into one of its Store objects and preserve API compatibility. I think the latter is preferrable because that allows more users (not just fsspec and kerchunk) to easily adopt zarrv3.
I found that fsspec.mapping.FSMap or LazyReferenceMapper objects are passed to zarr.open and zarr.group calls.
No, this is not the model.
- LazyReferenceMapper provides references to referenceFS. It is also map-like, but further work must be done to turn references to bytes
- FSMap can be passed directly to zarr2, but it wraps it in a Store too, made for generic dict-like input
- there is also an FSspecStore, which is what you get if you pass a URL to zarr.open
- v3 has its own version of fsspecstore https://github.com/zarr-developers/zarr-python/pull/1785
Sorry I didn't grok that. Here I see fsspec tests creating LazyReferenceMappers and passing them to zarr.open. In ZarrV3, all the object creation functions run the StoreLike through this function. I don't see how a RemoteStore would be created unless a user explicitly passed it to the zarr.open call like this:
store = RemoteStore(url="abfs://...")
zarr.open(store=store)
there is also an FSspecStore, which is what you get if you pass a URL to zarr.open
Unless I'm missing it, passing a URL to zarr.open in zarrV3 will return a local store after it's run through make_store_path:
def make_store_path(store_like: StoreLike | None, *, mode: OpenMode | None = None) -> StorePath:
if isinstance(store_like, StorePath):
if mode is not None:
assert mode == store_like.store.mode
return store_like
elif isinstance(store_like, Store):
if mode is not None:
assert mode == store_like.mode
return StorePath(store_like)
elif store_like is None:
if mode is None:
mode = "w" # exception to the default mode = 'r'
return StorePath(MemoryStore(mode=mode))
elif isinstance(store_like, str):
return StorePath(LocalStore(Path(store_like), mode=mode or "r"))
raise TypeError
Did this work in ZarrV2?
zarr.open(store="abfs://...")?
passing a URL to zarr.open in zarrV3 will return a local store after it's run through make_store_path
zarr.open(store="abfs://...")?
Yes, this most certainly worked, and if it no longer does, I consider this a major regression and worse user experience. The optional argument storage_options gets passed to fsspec.
I see fsspec tests creating LazyReferenceMappers and passing them to zarr.open
I suppose you are right and this works too - the lazy mapper is also dict-like. It's probably a case of my lazyness in the tests; it only works where the "files" are stored as raw binary, not as references (which is the case for the tests).
cc @jhamman
Yes, this most certainly worked, and if it no longer does, I consider this a major regression and worse user experience. The optional argument
storage_optionsgets passed to fsspec.
It doesn't work in ZarrV3 yet:
# Missing storage_options but it's irrelevant
# this eventually tries to construct a StorePath(LocalStore, 'file://abfs:/daymet-zarr/daily/hi.zarr')
# and will throw because it's not writable
zarr.open(store="abfs://daymet-zarr/daily/hi.zarr")
I see fsspec tests creating LazyReferenceMappers and passing them to zarr.open
I suppose you are right and this works too - the lazy mapper is also dict-like. It's probably a case of my lazyness in the tests; it only works where the "files" are stored as raw binary, not as references (which is the case for the tests).
Laziness can be good, and your tests could probably use the MemoryStore at some point, but ZarrV3 shouldn't require you to change your call sites IMO.
Here probably - explicitly expects .zarray in every path containing data.