xarray icon indicating copy to clipboard operation
xarray copied to clipboard

Make open_zarr work with UPath

Open vladidobro opened this issue 6 months ago • 1 comments

Is your feature request related to a problem?

Hi xarray community.

When I try to put UPath into xr.open_zarr, it fails. I didn't check with the other open_* methods.

from upath import UPath
up = UPath('az://[email protected]/mystore.zarr', anon=False)
print(up.storage_options)  # {'account_name': 'myaccount', 'anon': False}
# up.fs works fine
xr.open_zarr(up)
ValueError: unable to connect to account for Must provide either a connection_string or account_name with credentials!!

When I debugged it, the issue seems to be that xarray coerces the UPath into a string url (which is in this case just 'az://mycontainer/mystore.zarr', and then calls fsspec.url_to_fs(url, **{'asynchronous': True}), so that all storage options bound to the UPath are lost.

Describe the solution you'd like

I think it would be a very nice addition if xarray added special handling for UPath, because it seems like the perfect way to save the zarr store destination in a variable, without performing any work.

I think the implementation cannot be so naive as to just use the UPath.fs attribute as filesystem when it encounters UPath, because it needs asynchronous=True for zarr. So the solution would be to merge UPath.storage_options into the storage_options dict passed to xr.open_zarr, maybe?

I could put up a proof-of-concept PR as a first contribution if you are interested.

Describe alternatives you've considered

Everything works well if I input the url and storage options separately, but it is less convenient.

Additional context

UPath is a project by fsspec community and it seems to have quite good adoption in various python libraries. It expands pathlib.Path interface to fsspec filesystems, such as S3. It can be used to bundle together a path, protocol and storage options in a single object, which is more convenient than passing around tuple[str, dict[str, Any]]. https://github.com/fsspec/universal_pathlib

vladidobro avatar Jun 04 '25 03:06 vladidobro

Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!

welcome[bot] avatar Jun 04 '25 03:06 welcome[bot]

I have seen a similar issue with open_dataset and the h5netcdf backend, and there again the storage options are lost when the path is converted to a string. I was wondering if (U)Path.open() could be leveraged instead of systematically converting filename to a string?

mraspaud avatar Jul 25 '25 11:07 mraspaud