zarr-python
zarr-python copied to clipboard
Accessing restricted groups over s3 produces zero filled array
Zarr version
2.15.0
Numcodecs version
0.11.0
Python Version
3.11.4
Operating System
Linux/Ubuntu
Installation
pip install zarr
Description
Hi,
I am attempting to read a consolidated zarr file with lots of groups remotely from a minio s3 bucket. Everything is working nicely with anonymous access. However, when I try to restrict some groups using the minio ACL tools, Zarr returns a blank zero filled array with no error. I would expect that Zarr/FSpec would throw a "permission denied" or similar error.
For example, I have the following structure, and want to deny anonymous access to group 28412.
amc_test.zarr
├── .zgroup
├── .zmetadata
├── 28412 ---> Access is denied in s3 ACL to this group and below.
│ ├── .zgroup
│ ├── data
│ │ ├── .zarray
│ │ └── 0
│ ├── error
│ │ ├── .zarray
│ │ └── 0
│ └── time
│ ├── .zarray
│ └── 0
├── 28415 ---> Still want public access to this group
│ ├── .zgroup
│ ├── data
│ │ ├── .zarray
│ │ └── 0
│ ├── error
│ │ ├── .zarray
│ │ └── 0
│ └── time
│ ├── .zarray
│ └── 0
I can set the corresponding ACL to deny anonymous access on that particular group in minio. But when I go to try and read the data, instead of getting an error, I get a blank array filled with zeros. I guess Zarr sees the file is not readable and assumes that that it doesn't exist, rather than throw an error.
Steps to reproduce
Here is a simple example to how I am accessing the data from s3. The minio bucket is at localhost:10101. If I turn of the ACL in minio then this script runs without errors. However, I would expect an error to be thrown on the penultimate line. Instead I get a blank zero filled array.
import s3fs
import zarr
import numpy as np
file_name = 'amc_test.zarr'
s3 = s3fs.S3FileSystem(
anon=True,
use_ssl=False,
client_kwargs={
"endpoint_url": "http://localhost:10101"
},
)
store = zarr.storage.FSStore(f'mast/{file_name}', fs=s3)
handle = zarr.open_consolidated(store)
arr = handle['28415']['data'][:] # <-- This works as expected. There is no access control on this group.
assert not np.all(arr == 0)
arr = handle['28412']['data'][:] # <-- I expected an error to be thrown here.
assert not np.all(arr == 0) # <-- This assert fails.
Additional output
No response
Thanks for the clear issue, @samueljackson92. My assumption is you are running into the inverse problem that I had -- https://github.com/fsspec/filesystem_spec/issues/342 -- i.e. some implementations were raising a 403 on chunks leading to application errors. Could you try configuring the exceptions at:
https://github.com/zarr-developers/zarr-python/pull/546/files#diff-565e487a2f60258b6baa2e4db8ef175cc16b8a949651834bd43d0a9f21e07358R974
exceptions=(KeyError, PermissionError, IOError)
such that you see the behavior you are looking for? It might be that I was looking for things to be too tolerant, but you raise a good point.
Hi @joshmoore, thanks for your help. I hadn't noticed the options for tweaking exceptions in FSStore. Unfortunately, changing the exception list doesn't seem to make a difference. I tried:
store = zarr.storage.FSStore(f'mast/{file_name}', fs=s3, exceptions=[BaseException])
to try and just catch anything, but I still get the same assertion failure as above.
Do you have any further advice for where this could be being handled?
Hi @samueljackson92. I think you want the reverse, i.e. you want the PermissionErrors thrown, i.e.:
exceptions=(KeyError, IOError)
@joshmoore ah sorry, my mistake!
Unfortunately, I still see the same issue. I tried both:
store = zarr.storage.FSStore(f'mast/{file_name}', fs=s3, exceptions=(KeyError, IOError))
and
store = zarr.storage.FSStore(f'mast/{file_name}', fs=s3, exceptions=())
Both of these also just return a zero filled array with no error
Doh! Ok. Either it requires something additional or it was a different issue to begin with. In looking around, I see https://github.com/zarr-developers/zarr-python/pull/1237 -- could you see if that helps? If not, https://github.com/zarr-developers/zarr-python/pull/489 is another candidate.
@joshmoore thanks for the signposting! I will investigate the two suggested PRs when I have a little more time and see if they help.