datatree icon indicating copy to clipboard operation
datatree copied to clipboard

Opening a datatree from S3 bucket

Open vlevasseur073 opened this issue 1 year ago • 2 comments

Dears,

it seems that the current version of datatree can't handle stores from cloud storage (tests made with S3 only). For instance, trying to open a datatree following the same syntax as xarray.open_dataset (using fsspec chain URLs):

store="zip::s3://bucket/path/product.zarr.zip"
dt = datatree.open_datatree(store,engine="zarr",backend_kwargs={"storage_options": {"s3":secrets["s3input"]}})

where secrets["s3input"] is a dict containing the AWS secret keys and endpoint URLs.

fails with

ClientError                               Traceback (most recent call last)
File [/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/s3fs/core.py:113](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/s3fs/core.py:113), in _error_wrapper(func, args, kwargs, retries)
    [112](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/s3fs/core.py:112) try:
--> [113](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/s3fs/core.py:113)     return await func(*args, **kwargs)
    [114](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/s3fs/core.py:114) except S3_RETRYABLE_ERRORS as e:

File [/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiobotocore/client.py:408](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiobotocore/client.py:408), in AioBaseClient._make_api_call(self, operation_name, api_params)
    [407](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiobotocore/client.py:407)     error_class = self.exceptions.from_code(error_code)
--> [408](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiobotocore/client.py:408)     raise error_class(parsed_response, operation_name)
    [409](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiobotocore/client.py:409) else:

ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden

Indeed in _open_datatree_zarr from datatree/io.py, the kwargs are not given to the zarr.open_group function, so that specifically in this case the storage_options are ignored. As a workaround in my specific case, replacing in datatree/io.py l.87 (v0.0.14)

zds = zarr.open_group(store, mode="r")

by

storage_options = kwargs["backend_kwargs"]
zds = zarr.open_group(store, mode="r",**storage_options)

works just fine.

vlevasseur073 avatar Mar 13 '24 14:03 vlevasseur073

Hi @vlevasseur073, sorry for the slow reply here.

We would welcome a PR to fix this!

TomNicholas avatar Apr 20 '24 01:04 TomNicholas

Hi @TomNicholas, sorry that I have let this issue unanswered for a long time... I've been recently back to this issue and in the meantime I checked the status of the datatree integration into pydata/xarray. Finally, I've opened the equivalent issue https://github.com/pydata/xarray/issues/9197 and propose a PR https://github.com/pydata/xarray/pull/9198

Regards, Vincent

vlevasseur073 avatar Jul 04 '24 14:07 vlevasseur073

closing in favor of pydata/xarray#9197

keewis avatar Aug 13 '24 16:08 keewis