zarr v3 does not accept stores of type FSMap
Zarr version
v3.0.0
Numcodecs version
v0.14.1
Python Version
3.11.11
Operating System
Mac
Installation
using pip into conda environment
Description
With Zarr v2, I was able to pass a store of type fsspec.mapping.FSMap. However, with Zarr v3, I get the following error:
TypeError: Unsupported type for store_like: 'FSMap'
Is this a bug, a breaking change, or was passing mappers like FSMap not officially supported even in Zarr v2? The snippet below only works with Zarr v2.
Steps to reproduce
import tempfile
import fsspec
import zarr
fs = fsspec.filesystem("file")
with tempfile.TemporaryDirectory() as tmpdir:
mapper = fs.get_mapper(tmpdir)
z = zarr.open(store=mapper, shape=(100, 100), chunks=(10, 10), dtype="f4")
Additional output
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[7], line 9
7 with tempfile.TemporaryDirectory() as tmpdir:
8 mapper = fs.get_mapper(tmpdir)
----> 9 z = zarr.open(store=mapper, shape=(100, 100), chunks=(10, 10), dtype="f4")
File ~/miniforge3/envs/zarr/lib/python3.11/site-packages/zarr/_compat.py:43, in _deprecate_positional_args.<locals>._inner_deprecate_positional_args.<locals>.inner_f(*args, **kwargs)
41 extra_args = len(args) - len(all_args)
42 if extra_args <= 0:
---> 43 return f(*args, **kwargs)
45 # extra_args > 0
46 args_msg = [
47 f"{name}={arg}"
48 for name, arg in zip(kwonly_args[:extra_args], args[-extra_args:], strict=False)
49 ]
File ~/miniforge3/envs/zarr/lib/python3.11/site-packages/zarr/api/synchronous.py:190, in open(store, mode, zarr_version, zarr_format, path, storage_options, **kwargs)
152 @_deprecate_positional_args
153 def open(
154 store: StoreLike | None = None,
(...)
161 **kwargs: Any, # TODO: type kwargs as valid args to async_api.open
162 ) -> Array | Group:
163 """Open a group or array using file-mode-like semantics.
164
165 Parameters
(...)
188 Return type depends on what exists in the given store.
189 """
--> 190 obj = sync(
191 async_api.open(
192 store=store,
193 mode=mode,
194 zarr_version=zarr_version,
195 zarr_format=zarr_format,
196 path=path,
197 storage_options=storage_options,
198 **kwargs,
199 )
200 )
201 if isinstance(obj, AsyncArray):
202 return Array(obj)
File ~/miniforge3/envs/zarr/lib/python3.11/site-packages/zarr/core/sync.py:142, in sync(coro, loop, timeout)
139 return_result = next(iter(finished)).result()
141 if isinstance(return_result, BaseException):
--> 142 raise return_result
143 else:
144 return return_result
File ~/miniforge3/envs/zarr/lib/python3.11/site-packages/zarr/core/sync.py:98, in _runner(coro)
93 """
94 Await a coroutine and return the result of running it. If awaiting the coroutine raises an
95 exception, the exception will be returned.
96 """
97 try:
---> 98 return await coro
99 except Exception as ex:
100 return ex
File ~/miniforge3/envs/zarr/lib/python3.11/site-packages/zarr/api/asynchronous.py:309, in open(store, mode, zarr_version, zarr_format, path, storage_options, **kwargs)
280 """Convenience function to open a group or array using file-mode-like semantics.
281
282 Parameters
(...)
305 Return type depends on what exists in the given store.
306 """
307 zarr_format = _handle_zarr_version_or_format(zarr_version=zarr_version, zarr_format=zarr_format)
--> 309 store_path = await make_store_path(store, mode=mode, path=path, storage_options=storage_options)
311 # TODO: the mode check below seems wrong!
312 if "shape" not in kwargs and mode in {"a", "r", "r+", "w"}:
File ~/miniforge3/envs/zarr/lib/python3.11/site-packages/zarr/storage/_common.py:316, in make_store_path(store_like, path, mode, storage_options)
314 else:
315 msg = f"Unsupported type for store_like: '{type(store_like).__name__}'" # type: ignore[unreachable]
--> 316 raise TypeError(msg)
318 result = await StorePath.open(store, path=path_normalized, mode=mode)
320 if storage_options and not used_storage_options:
TypeError: Unsupported type for store_like: 'FSMap'
accepting instances of FSMap was part of the v2 design (see this PR), so I'd say you are seeing a regression / bug.
I will let the fsspec experts weigh in on whether there's any good reason not to accept FSMap instances here -- this may very well be a case of "we didn't implement this yet", in which case we should implement it :)
This was an intentional breaking change. The Fsspec's FSMap interface was used as a drop in for Zarr's previous store interface (i.e. MutableMapping).
If you have a fsspec FileSystem, you can create the store directly though. Something like this should work:
import fsspec
import zarr
fs = fsspec.filesystem("s3", asynchronous=True)
store = zarr.storage.FsspecStore(fs, path="bucket/foo/bar")
z = zarr.open(store=mapper, shape=(100, 100), chunks=(10, 10), dtype="f4")
Note that the file implementation in fsspec does not support async so it cannot be currently used in zarr (see #2533 for a wip pr).
I'll label this as docs, since it would be good to put the above comment in our migration guide.
This is going to bite a lot of people. 😬
It is relatively easy to convert an FSMap object to the type of store we need. We could consider adding some convenience layer for this.
It is relatively easy to convert an
FSMapobject to the type of store we need. We could consider adding some convenience layer for this.
@martindurant will know best but remember that we're using Fsspec's async interface which is not active in most (any?) FSMap implementations. There was concern a while back about loop-in-loop issues if using filesystems with asynchronous=false.
Rather than wrapping, passing the filesystem and path arguments directly seems the right thing to do, which was already suggested above.
I'm not sure what might happen if a (sync) FSMap was used at the (hidden) dict within a (async) MemoryStore. It would probably work and for local files not make a performance difference?
👍 - very open to us finding a way to accept FSMap objects as store arguments - we'll need to convert them to an FSspecStore but it sounds like we can do that without much risk.
we'll need to convert them to an FSspecStore
This will generally require wrapping sync FSs or setting asynchronous=True for async-able FSs.
For the record, we ran into this issue when we tried to upgrade to 3.0.1 yesterday to deal with a python dependency build issue. So +1 on this causing users some pain. We're now planning to go ahead and take the plunge earlier than planned to transition to zarr>=3 using your handy, dandy guide: https://zarr.readthedocs.io/en/latest/user-guide/v3_migration.html.
First, apologies if this is not the appropriate place to post/ask this. I'm confused and stuck.
This was an intentional breaking change. The Fsspec's FSMap interface was used as a drop in for Zarr's previous store interface (i.e.
MutableMapping).If you have a fsspec FileSystem, you can create the store directly though. Something like this should work:
import fsspec import zarr
fs = fsspec.filesystem("s3", asynchronous=True) store = zarr.storage.FsspecStore(fs, path="bucket/foo/bar") z = zarr.open(store=mapper, shape=(100, 100), chunks=(10, 10), dtype="f4")
Note that the
fileimplementation in fsspec does not support async so it cannot be currently used in zarr (see #2533 for a wip pr).
I'm trying to reproduce any of several of the notebooks from the CMIP6 Pangeo Gallery on a Google CoLab notebook (in this example I'm following this one: https://gallery.pangeo.io/repos/pangeo-gallery/cmip6/basic_search_and_load.html, but haven't found any that don't use zarr), but keep getting the "Unsupported type for store_like: 'FSMap'" error at the xr.open_zarr step. I don't really understand @jhamman 's suggestion to create a store directly, but here is what I tried:
!pip install zarr
import pandas as pd
import xarray as xr
import zarr
import fsspec
####### _Directly following the example notebook to find this data:_
df = pd.read_csv('https://storage.googleapis.com/cmip6/cmip6-zarr-consolidated-stores.csv')
df_ta = df.query("activity_id=='CMIP' & table_id == 'Amon' & variable_id == 'tas' & experiment_id == 'historical'")
df_ta_ncar = df_ta.query('institution_id == "NCAR"')
#### # _get the path to a specific zarr store (the first one from the dataframe above)_
zstore = df_ta_ncar.zstore.values[-1]
print(zstore)
# here is where I deviate from the example notebook and try and follow the "create the store directly" advice without understanding it:
fs = fsspec.filesystem("gs",asynchronous=True)
zstore = zarr.storage.FsspecStore(fs,path="/cmip6/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r2i1p1f1/Amon/tas/gn/v20200226/")
# create a mutable-mapping-style interface to the store
mapper = fsspec.get_mapper(zstore)
# # open it using xarray and zarr
ds = xr.open_zarr(mapper, consolidated=True)
This returns the error
---> 18 ds = xr.open_zarr(mapper, consolidated=True) ... TypeError: Unsupported type for store_like: 'FSMap'
@marysa 👋 - I think you want:
fs = fsspec.filesystem("gs", asynchronous=True)
zstore = zarr.storage.FsspecStore(fs, path="/cmip6/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r2i1p1f1/Amon/tas/gn/v20200226/")
ds = xr.open_zarr(zstore, consolidated=True)
Alternatively, you can likely just pass the gs url directly to xarray:
url = "gs://cmip6/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r2i1p1f1/Amon/tas/gn/v20200226"
ds = xr.open_zarr(url, consolidated=True)
https://github.com/zarr-developers/zarr-python/pull/2774 should fix this so that you don't need to change the original code
Amazing, @jhamman's solution worked! And if #2774 fixes it long-term, even better! Thanks so much!
Chiming in here with another related use-case that no longer works in v3. In our lab's case, we're reading ZipStores from a central data server over SFTP via sshfs. Here's a minimal working example that worked in v2 but now raises the TypeError: Unsupported type for store_like: 'FSMap' mentioned in the OP:
>>> import zarr
>>> import fsspec
>>> from fsspec.implementations.zip import ZipFileSystem
>>> from sshfs import SSHFileSystem
>>> rfs = SSHFileSystem("<data server hostname>", username="<username>")
>>> fm = rfs.open("<path-to-zarr-archive>.zarr.zip")
>>> zfs = ZipFileSystem(fm, mode="r")
>>> store = fsspec.FSMap("", zfs, check=False)
>>> z = zarr.group(store=store);
If this is now the recommended syntax
fs = fsspec.filesystem("gs", asynchronous=True)
zstore = zarr.storage.FsspecStore(fs, path="/cmip6/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r2i1p1f1/Amon/tas/gn/v20200226/")
for creating a zarr store to interact with, then presumably that now means that the section in xarray's IO docs page about writing to zarr is outdated? That uses this example
import gcsfs
fs = gcsfs.GCSFileSystem(project="<project-name>", token=None)
gcsmap = gcsfs.mapping.GCSMap("<bucket-name>", gcs=fs, check=True, create=False)
# write to the bucket
ds.to_zarr(store=gcsmap)
Also @rossbar your example doesn't use xarray at all - I recommend you raise that upstream on the zarr-python issue tracker.
This also works and is a lot simpler:
ds.to_zarr("gs://cmip6/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r2i1p1f1/Amon/tas/gn/v20200226/")
Hopefully this is the correct place to ask. I'm trying to create a Zarr store and am getting errors.
This is our old way in zarr v2:
s3 = s3fs.S3FileSystem()
store = s3fs.S3Map(root=f"{bucket_name}/zarr/{data_field_id}/", s3=s3)
zarr_store = zarr.open(store, mode="w")
I tried this in zarr v3:
# Create the Zarr store
s3 = s3fs.S3FileSystem()
store = zarr.storage.FsspecStore(s3, path=f"{bucket_name}/zarr/{data_field_id}/")
zarr_store = zarr.open_group(store, mode="w")
For the last line, I also tried zarr_store = zarr.create_group(store). Both are getting the error on that last line: TypeError: object bytes can't be used in 'await' expression. Do you know what is wrong here? Thanks, all.
hi @dieumynguyen, sorry for the delay here. I suspect in your case the problem comes from using moto, in particular the mock_aws decorator, which does not support asynchronous operations. See this stackoverflow question for some discussion about this issue. Since our store API is now async, a lot of things in vanilla moto will not work (@martindurant correct me if I'm wrong here).
In our test suite, we mock aws by starting up a moto server. See the fsspecstore tests for an example of this. it's a lot more verbose than the mock_aws decorator, but it works in our test suite.
@d-v-b - Thanks! This should be a relatively simple change for us. We already have some other unit tests that start and stop the moto server.
hi @dieumynguyen, sorry for the delay here. I suspect in your case the problem comes from using
moto, in particular themock_awsdecorator, which does not support asynchronous operations. See this stackoverflow question for some discussion about this issue. Since our store API is now async, a lot of things in vanilla moto will not work (@martindurant correct me if I'm wrong here).In our test suite, we mock aws by starting up a moto server. See the fsspecstore tests for an example of this. it's a lot more verbose than the
mock_awsdecorator, but it works in our test suite.
#2774 added support for FSMap objects that host the most common fsspec filesystems (e.g., s3fs, adlfs, gcsfs).
I think we should either leave this open or create a new issue for tracking support for FSMap objects that wrap fsspec filesystems that wrap other fsspec filesystems. E.g., a common pattern for using ReferenceFileSystem with Zarr-Python 2 still does not work with Zarr-Python 3.