Opening a datatree from S3 bucket
Dears,
it seems that the current version of datatree can't handle stores from cloud storage (tests made with S3 only). For instance, trying to open a datatree following the same syntax as xarray.open_dataset (using fsspec chain URLs):
store="zip::s3://bucket/path/product.zarr.zip"
dt = datatree.open_datatree(store,engine="zarr",backend_kwargs={"storage_options": {"s3":secrets["s3input"]}})
where secrets["s3input"] is a dict containing the AWS secret keys and endpoint URLs.
fails with
ClientError Traceback (most recent call last)
File [/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/s3fs/core.py:113](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/s3fs/core.py:113), in _error_wrapper(func, args, kwargs, retries)
[112](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/s3fs/core.py:112) try:
--> [113](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/s3fs/core.py:113) return await func(*args, **kwargs)
[114](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/s3fs/core.py:114) except S3_RETRYABLE_ERRORS as e:
File [/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiobotocore/client.py:408](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiobotocore/client.py:408), in AioBaseClient._make_api_call(self, operation_name, api_params)
[407](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiobotocore/client.py:407) error_class = self.exceptions.from_code(error_code)
--> [408](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiobotocore/client.py:408) raise error_class(parsed_response, operation_name)
[409](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiobotocore/client.py:409) else:
ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
Indeed in _open_datatree_zarr from datatree/io.py, the kwargs are not given to the zarr.open_group function, so that specifically in this case the storage_options are ignored. As a workaround in my specific case, replacing in datatree/io.py l.87 (v0.0.14)
zds = zarr.open_group(store, mode="r")
by
storage_options = kwargs["backend_kwargs"]
zds = zarr.open_group(store, mode="r",**storage_options)
works just fine.
Hi @vlevasseur073, sorry for the slow reply here.
We would welcome a PR to fix this!
Hi @TomNicholas, sorry that I have let this issue unanswered for a long time... I've been recently back to this issue and in the meantime I checked the status of the datatree integration into pydata/xarray. Finally, I've opened the equivalent issue https://github.com/pydata/xarray/issues/9197 and propose a PR https://github.com/pydata/xarray/pull/9198
Regards, Vincent
closing in favor of pydata/xarray#9197