zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

zarr v3 does not accept stores of type FSMap

Open malmans2 opened this issue 11 months ago • 20 comments

Zarr version

v3.0.0

Numcodecs version

v0.14.1

Python Version

3.11.11

Operating System

Mac

Installation

using pip into conda environment

Description

With Zarr v2, I was able to pass a store of type fsspec.mapping.FSMap. However, with Zarr v3, I get the following error:

TypeError: Unsupported type for store_like: 'FSMap'

Is this a bug, a breaking change, or was passing mappers like FSMap not officially supported even in Zarr v2? The snippet below only works with Zarr v2.

Steps to reproduce

import tempfile

import fsspec
import zarr

fs = fsspec.filesystem("file")
with tempfile.TemporaryDirectory() as tmpdir:
    mapper = fs.get_mapper(tmpdir)
    z = zarr.open(store=mapper, shape=(100, 100), chunks=(10, 10), dtype="f4")

Additional output

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[7], line 9
      7 with tempfile.TemporaryDirectory() as tmpdir:
      8     mapper = fs.get_mapper(tmpdir)
----> 9     z = zarr.open(store=mapper, shape=(100, 100), chunks=(10, 10), dtype="f4")

File ~/miniforge3/envs/zarr/lib/python3.11/site-packages/zarr/_compat.py:43, in _deprecate_positional_args.<locals>._inner_deprecate_positional_args.<locals>.inner_f(*args, **kwargs)
     41 extra_args = len(args) - len(all_args)
     42 if extra_args <= 0:
---> 43     return f(*args, **kwargs)
     45 # extra_args > 0
     46 args_msg = [
     47     f"{name}={arg}"
     48     for name, arg in zip(kwonly_args[:extra_args], args[-extra_args:], strict=False)
     49 ]

File ~/miniforge3/envs/zarr/lib/python3.11/site-packages/zarr/api/synchronous.py:190, in open(store, mode, zarr_version, zarr_format, path, storage_options, **kwargs)
    152 @_deprecate_positional_args
    153 def open(
    154     store: StoreLike | None = None,
   (...)
    161     **kwargs: Any,  # TODO: type kwargs as valid args to async_api.open
    162 ) -> Array | Group:
    163     """Open a group or array using file-mode-like semantics.
    164
    165     Parameters
   (...)
    188         Return type depends on what exists in the given store.
    189     """
--> 190     obj = sync(
    191         async_api.open(
    192             store=store,
    193             mode=mode,
    194             zarr_version=zarr_version,
    195             zarr_format=zarr_format,
    196             path=path,
    197             storage_options=storage_options,
    198             **kwargs,
    199         )
    200     )
    201     if isinstance(obj, AsyncArray):
    202         return Array(obj)

File ~/miniforge3/envs/zarr/lib/python3.11/site-packages/zarr/core/sync.py:142, in sync(coro, loop, timeout)
    139 return_result = next(iter(finished)).result()
    141 if isinstance(return_result, BaseException):
--> 142     raise return_result
    143 else:
    144     return return_result

File ~/miniforge3/envs/zarr/lib/python3.11/site-packages/zarr/core/sync.py:98, in _runner(coro)
     93 """
     94 Await a coroutine and return the result of running it. If awaiting the coroutine raises an
     95 exception, the exception will be returned.
     96 """
     97 try:
---> 98     return await coro
     99 except Exception as ex:
    100     return ex

File ~/miniforge3/envs/zarr/lib/python3.11/site-packages/zarr/api/asynchronous.py:309, in open(store, mode, zarr_version, zarr_format, path, storage_options, **kwargs)
    280 """Convenience function to open a group or array using file-mode-like semantics.
    281
    282 Parameters
   (...)
    305     Return type depends on what exists in the given store.
    306 """
    307 zarr_format = _handle_zarr_version_or_format(zarr_version=zarr_version, zarr_format=zarr_format)
--> 309 store_path = await make_store_path(store, mode=mode, path=path, storage_options=storage_options)
    311 # TODO: the mode check below seems wrong!
    312 if "shape" not in kwargs and mode in {"a", "r", "r+", "w"}:

File ~/miniforge3/envs/zarr/lib/python3.11/site-packages/zarr/storage/_common.py:316, in make_store_path(store_like, path, mode, storage_options)
    314     else:
    315         msg = f"Unsupported type for store_like: '{type(store_like).__name__}'"  # type: ignore[unreachable]
--> 316         raise TypeError(msg)
    318     result = await StorePath.open(store, path=path_normalized, mode=mode)
    320 if storage_options and not used_storage_options:

TypeError: Unsupported type for store_like: 'FSMap'

malmans2 avatar Jan 14 '25 14:01 malmans2

accepting instances of FSMap was part of the v2 design (see this PR), so I'd say you are seeing a regression / bug.

I will let the fsspec experts weigh in on whether there's any good reason not to accept FSMap instances here -- this may very well be a case of "we didn't implement this yet", in which case we should implement it :)

d-v-b avatar Jan 14 '25 14:01 d-v-b

This was an intentional breaking change. The Fsspec's FSMap interface was used as a drop in for Zarr's previous store interface (i.e. MutableMapping).

If you have a fsspec FileSystem, you can create the store directly though. Something like this should work:

import fsspec
import zarr

fs = fsspec.filesystem("s3", asynchronous=True)
store = zarr.storage.FsspecStore(fs, path="bucket/foo/bar")
z = zarr.open(store=mapper, shape=(100, 100), chunks=(10, 10), dtype="f4")

Note that the file implementation in fsspec does not support async so it cannot be currently used in zarr (see #2533 for a wip pr).

jhamman avatar Jan 17 '25 17:01 jhamman

I'll label this as docs, since it would be good to put the above comment in our migration guide.

dstansby avatar Jan 17 '25 18:01 dstansby

This is going to bite a lot of people. 😬

It is relatively easy to convert an FSMap object to the type of store we need. We could consider adding some convenience layer for this.

rabernat avatar Jan 17 '25 19:01 rabernat

It is relatively easy to convert an FSMap object to the type of store we need. We could consider adding some convenience layer for this.

@martindurant will know best but remember that we're using Fsspec's async interface which is not active in most (any?) FSMap implementations. There was concern a while back about loop-in-loop issues if using filesystems with asynchronous=false.

jhamman avatar Jan 17 '25 20:01 jhamman

Rather than wrapping, passing the filesystem and path arguments directly seems the right thing to do, which was already suggested above.

I'm not sure what might happen if a (sync) FSMap was used at the (hidden) dict within a (async) MemoryStore. It would probably work and for local files not make a performance difference?

martindurant avatar Jan 17 '25 20:01 martindurant

👍 - very open to us finding a way to accept FSMap objects as store arguments - we'll need to convert them to an FSspecStore but it sounds like we can do that without much risk.

jhamman avatar Jan 23 '25 01:01 jhamman

we'll need to convert them to an FSspecStore

This will generally require wrapping sync FSs or setting asynchronous=True for async-able FSs.

martindurant avatar Jan 23 '25 16:01 martindurant

For the record, we ran into this issue when we tried to upgrade to 3.0.1 yesterday to deal with a python dependency build issue. So +1 on this causing users some pain. We're now planning to go ahead and take the plunge earlier than planned to transition to zarr>=3 using your handy, dandy guide: https://zarr.readthedocs.io/en/latest/user-guide/v3_migration.html.

christine-e-smit avatar Jan 23 '25 21:01 christine-e-smit

First, apologies if this is not the appropriate place to post/ask this. I'm confused and stuck.

This was an intentional breaking change. The Fsspec's FSMap interface was used as a drop in for Zarr's previous store interface (i.e. MutableMapping).

If you have a fsspec FileSystem, you can create the store directly though. Something like this should work:

import fsspec import zarr

fs = fsspec.filesystem("s3", asynchronous=True) store = zarr.storage.FsspecStore(fs, path="bucket/foo/bar") z = zarr.open(store=mapper, shape=(100, 100), chunks=(10, 10), dtype="f4")

Note that the file implementation in fsspec does not support async so it cannot be currently used in zarr (see #2533 for a wip pr).

I'm trying to reproduce any of several of the notebooks from the CMIP6 Pangeo Gallery on a Google CoLab notebook (in this example I'm following this one: https://gallery.pangeo.io/repos/pangeo-gallery/cmip6/basic_search_and_load.html, but haven't found any that don't use zarr), but keep getting the "Unsupported type for store_like: 'FSMap'" error at the xr.open_zarr step. I don't really understand @jhamman 's suggestion to create a store directly, but here is what I tried:

!pip install zarr 

import pandas as pd
import xarray as xr
import zarr
import fsspec

####### _Directly following the example notebook to find this data:_ 
df = pd.read_csv('https://storage.googleapis.com/cmip6/cmip6-zarr-consolidated-stores.csv')
df_ta = df.query("activity_id=='CMIP' & table_id == 'Amon' & variable_id == 'tas' & experiment_id == 'historical'")
df_ta_ncar = df_ta.query('institution_id == "NCAR"')

#### # _get the path to a specific zarr store (the first one from the dataframe above)_
zstore = df_ta_ncar.zstore.values[-1]
print(zstore)

# here is where I deviate from the example notebook and try and follow the "create the store directly" advice without understanding it:
fs = fsspec.filesystem("gs",asynchronous=True)
zstore = zarr.storage.FsspecStore(fs,path="/cmip6/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r2i1p1f1/Amon/tas/gn/v20200226/")

# create a mutable-mapping-style interface to the store
mapper = fsspec.get_mapper(zstore)

# # open it using xarray and zarr
ds = xr.open_zarr(mapper, consolidated=True)

This returns the error ---> 18 ds = xr.open_zarr(mapper, consolidated=True) ... TypeError: Unsupported type for store_like: 'FSMap'

marysa avatar Feb 05 '25 00:02 marysa

@marysa 👋 - I think you want:

fs = fsspec.filesystem("gs", asynchronous=True)
zstore = zarr.storage.FsspecStore(fs, path="/cmip6/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r2i1p1f1/Amon/tas/gn/v20200226/")

ds = xr.open_zarr(zstore, consolidated=True)

Alternatively, you can likely just pass the gs url directly to xarray:

url = "gs://cmip6/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r2i1p1f1/Amon/tas/gn/v20200226"
ds = xr.open_zarr(url, consolidated=True)

jhamman avatar Feb 05 '25 03:02 jhamman

https://github.com/zarr-developers/zarr-python/pull/2774 should fix this so that you don't need to change the original code

martindurant avatar Feb 05 '25 15:02 martindurant

Amazing, @jhamman's solution worked! And if #2774 fixes it long-term, even better! Thanks so much!

marysa avatar Feb 05 '25 18:02 marysa

Chiming in here with another related use-case that no longer works in v3. In our lab's case, we're reading ZipStores from a central data server over SFTP via sshfs. Here's a minimal working example that worked in v2 but now raises the TypeError: Unsupported type for store_like: 'FSMap' mentioned in the OP:

>>> import zarr
>>> import fsspec
>>> from fsspec.implementations.zip import ZipFileSystem
>>> from sshfs import SSHFileSystem
>>> rfs = SSHFileSystem("<data server hostname>", username="<username>")
>>> fm = rfs.open("<path-to-zarr-archive>.zarr.zip")
>>> zfs = ZipFileSystem(fm, mode="r")
>>> store = fsspec.FSMap("", zfs, check=False)
>>> z = zarr.group(store=store);

rossbar avatar Feb 26 '25 21:02 rossbar

If this is now the recommended syntax

fs = fsspec.filesystem("gs", asynchronous=True)
zstore = zarr.storage.FsspecStore(fs, path="/cmip6/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r2i1p1f1/Amon/tas/gn/v20200226/")

for creating a zarr store to interact with, then presumably that now means that the section in xarray's IO docs page about writing to zarr is outdated? That uses this example

import gcsfs

fs = gcsfs.GCSFileSystem(project="<project-name>", token=None)
gcsmap = gcsfs.mapping.GCSMap("<bucket-name>", gcs=fs, check=True, create=False)
# write to the bucket
ds.to_zarr(store=gcsmap)

Also @rossbar your example doesn't use xarray at all - I recommend you raise that upstream on the zarr-python issue tracker.

TomNicholas avatar Apr 08 '25 20:04 TomNicholas

This also works and is a lot simpler:

ds.to_zarr("gs://cmip6/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r2i1p1f1/Amon/tas/gn/v20200226/")

rabernat avatar Apr 08 '25 20:04 rabernat

Hopefully this is the correct place to ask. I'm trying to create a Zarr store and am getting errors.

This is our old way in zarr v2:

s3 = s3fs.S3FileSystem()
store = s3fs.S3Map(root=f"{bucket_name}/zarr/{data_field_id}/", s3=s3)
zarr_store = zarr.open(store, mode="w")

I tried this in zarr v3:

# Create the Zarr store
s3 = s3fs.S3FileSystem()
store = zarr.storage.FsspecStore(s3, path=f"{bucket_name}/zarr/{data_field_id}/")
zarr_store = zarr.open_group(store, mode="w")

For the last line, I also tried zarr_store = zarr.create_group(store). Both are getting the error on that last line: TypeError: object bytes can't be used in 'await' expression. Do you know what is wrong here? Thanks, all.

dieumynguyen avatar Apr 22 '25 22:04 dieumynguyen

hi @dieumynguyen, sorry for the delay here. I suspect in your case the problem comes from using moto, in particular the mock_aws decorator, which does not support asynchronous operations. See this stackoverflow question for some discussion about this issue. Since our store API is now async, a lot of things in vanilla moto will not work (@martindurant correct me if I'm wrong here).

In our test suite, we mock aws by starting up a moto server. See the fsspecstore tests for an example of this. it's a lot more verbose than the mock_aws decorator, but it works in our test suite.

d-v-b avatar May 19 '25 19:05 d-v-b

@d-v-b - Thanks! This should be a relatively simple change for us. We already have some other unit tests that start and stop the moto server.

hi @dieumynguyen, sorry for the delay here. I suspect in your case the problem comes from using moto, in particular the mock_aws decorator, which does not support asynchronous operations. See this stackoverflow question for some discussion about this issue. Since our store API is now async, a lot of things in vanilla moto will not work (@martindurant correct me if I'm wrong here).

In our test suite, we mock aws by starting up a moto server. See the fsspecstore tests for an example of this. it's a lot more verbose than the mock_aws decorator, but it works in our test suite.

christine-e-smit avatar May 19 '25 19:05 christine-e-smit

#2774 added support for FSMap objects that host the most common fsspec filesystems (e.g., s3fs, adlfs, gcsfs).

I think we should either leave this open or create a new issue for tracking support for FSMap objects that wrap fsspec filesystems that wrap other fsspec filesystems. E.g., a common pattern for using ReferenceFileSystem with Zarr-Python 2 still does not work with Zarr-Python 3.

maxrjones avatar Jun 16 '25 14:06 maxrjones