zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

Accessing restricted groups over s3 produces zero filled array

Open samueljackson92 opened this issue 2 years ago • 6 comments
trafficstars

Zarr version

2.15.0

Numcodecs version

0.11.0

Python Version

3.11.4

Operating System

Linux/Ubuntu

Installation

pip install zarr

Description

Hi,

I am attempting to read a consolidated zarr file with lots of groups remotely from a minio s3 bucket. Everything is working nicely with anonymous access. However, when I try to restrict some groups using the minio ACL tools, Zarr returns a blank zero filled array with no error. I would expect that Zarr/FSpec would throw a "permission denied" or similar error.

For example, I have the following structure, and want to deny anonymous access to group 28412.

amc_test.zarr
├── .zgroup
├── .zmetadata
├── 28412               ---> Access is denied in s3 ACL to this group and below.
│   ├── .zgroup
│   ├── data
│   │   ├── .zarray
│   │   └── 0
│   ├── error
│   │   ├── .zarray
│   │   └── 0
│   └── time
│       ├── .zarray
│       └── 0
├── 28415              ---> Still want public access to this group
│   ├── .zgroup
│   ├── data
│   │   ├── .zarray
│   │   └── 0
│   ├── error
│   │   ├── .zarray
│   │   └── 0
│   └── time
│       ├── .zarray
│       └── 0

I can set the corresponding ACL to deny anonymous access on that particular group in minio. But when I go to try and read the data, instead of getting an error, I get a blank array filled with zeros. I guess Zarr sees the file is not readable and assumes that that it doesn't exist, rather than throw an error.

Steps to reproduce

Here is a simple example to how I am accessing the data from s3. The minio bucket is at localhost:10101. If I turn of the ACL in minio then this script runs without errors. However, I would expect an error to be thrown on the penultimate line. Instead I get a blank zero filled array.

import s3fs
import zarr
import numpy as np

file_name = 'amc_test.zarr'

s3 = s3fs.S3FileSystem(
    anon=True,
    use_ssl=False,
    client_kwargs={
        "endpoint_url": "http://localhost:10101"
    },
)

store = zarr.storage.FSStore(f'mast/{file_name}', fs=s3)
handle = zarr.open_consolidated(store)
arr = handle['28415']['data'][:]        # <-- This works as expected. There is no access control on this group.
assert not np.all(arr == 0)

arr = handle['28412']['data'][:]        # <-- I expected an error to be thrown here.
assert not np.all(arr == 0)             # <-- This assert fails.

Additional output

No response

samueljackson92 avatar Aug 18 '23 09:08 samueljackson92

Thanks for the clear issue, @samueljackson92. My assumption is you are running into the inverse problem that I had -- https://github.com/fsspec/filesystem_spec/issues/342 -- i.e. some implementations were raising a 403 on chunks leading to application errors. Could you try configuring the exceptions at:

https://github.com/zarr-developers/zarr-python/pull/546/files#diff-565e487a2f60258b6baa2e4db8ef175cc16b8a949651834bd43d0a9f21e07358R974

exceptions=(KeyError, PermissionError, IOError)

such that you see the behavior you are looking for? It might be that I was looking for things to be too tolerant, but you raise a good point.

joshmoore avatar Aug 24 '23 17:08 joshmoore

Hi @joshmoore, thanks for your help. I hadn't noticed the options for tweaking exceptions in FSStore. Unfortunately, changing the exception list doesn't seem to make a difference. I tried:

store = zarr.storage.FSStore(f'mast/{file_name}', fs=s3, exceptions=[BaseException])

to try and just catch anything, but I still get the same assertion failure as above.

Do you have any further advice for where this could be being handled?

samueljackson92 avatar Aug 25 '23 08:08 samueljackson92

Hi @samueljackson92. I think you want the reverse, i.e. you want the PermissionErrors thrown, i.e.:

exceptions=(KeyError, IOError)

joshmoore avatar Aug 25 '23 08:08 joshmoore

@joshmoore ah sorry, my mistake!

Unfortunately, I still see the same issue. I tried both:

store = zarr.storage.FSStore(f'mast/{file_name}', fs=s3, exceptions=(KeyError, IOError))

and

store = zarr.storage.FSStore(f'mast/{file_name}', fs=s3, exceptions=())

Both of these also just return a zero filled array with no error

samueljackson92 avatar Aug 25 '23 09:08 samueljackson92

Doh! Ok. Either it requires something additional or it was a different issue to begin with. In looking around, I see https://github.com/zarr-developers/zarr-python/pull/1237 -- could you see if that helps? If not, https://github.com/zarr-developers/zarr-python/pull/489 is another candidate.

joshmoore avatar Aug 25 '23 09:08 joshmoore

@joshmoore thanks for the signposting! I will investigate the two suggested PRs when I have a little more time and see if they help.

samueljackson92 avatar Aug 25 '23 09:08 samueljackson92