Accessing zarr group inside jupyter yields zero values and len(group) = 0
Minimal, reproducible code sample, a copy-pastable example if possible
import numpy as np
import zarr
store = zarr.open_group("gs://my_bucket/remote_group.zarr", mode="r")
for idx in ("group_1", "group_2"):
nested_group = store[idx]["nested"]["group"]
my_array = nested_group["array_in_group"]
print("Len: ", len(nested_group), "; Unique Values: ", np.unique(np.array(my_array)))
# Output:
# Len: 0 ; Unique Values: [0]
# Len: 0 ; Unique Values: [0]
# Len: 0 ; Unique Values: [0]
When copying the whole zarr store with gsutil cp gs://my_bucket/remote_group.zarr . and running the same code again while only changing the path, I get the desired output with the correct values:
import numpy as np
import zarr
store = zarr.open_group("./remote_group.zarr", mode="r") # Only change
for idx in ("group_1", "group_2"):
nested_group = store[idx]["nested"]["group"]
my_array = nested_group["array_in_group"]
print("Len: ", len(nested_group), "; Unique Values: ", np.unique(np.array(my_array)))
# Output:
# Len: 1 ; Unique Values: [0 7]
# Len: 1 ; Unique Values: [0]
# Len: 1 ; Unique Values: [ 0 1 2 3 6 7 8 9 10]
Problem description
Access to the same zarr group on a remote google cloud storage bucket give zeros as output when run in jupyter lab or jupyter notebook. I tested this in a number of different cases, for local access to remote storage and from a remote kubernetes pod in the same gco project with access to the bucket.
Cases tested:
- Access remote zarr from local jupyter / jupyter lab --> Doesn’t work
- Access remote zarr from remote jupyter / jupyter lab (kubernetes pod) --> Doesn’t work
- Access local zarr (copied with gsutil) from local jupyter --> Works
- Access remote zarr from a remote run (kubernetes pod) inside
python my_zarr_access_file.py--> Works - Access remote zarr from local run (same venv as jupyter) inside
python my_zarr_access_file.py--> Works
Version and installation information
Please provide the following:
- Value of
zarr.__version__: Tested '2.7.0' (kubernetes pod) and '2.6.1' (local venv) - Value of
numcodecs.__version__: Tested '0.7.3' (kubernetes pod and local venv) - Version of Python interpreter: Tested 3.8.3 (kubernetes pod) and 3.7.6 (local venv)
- Operating system (Linux/Windows/Mac) Linux (kubernetes pod) Mac (local venv)
- How Zarr was installed: "pip into venv" (local) / "pip inside conda env (kubernetes pod)"
@AdemFr : you may need to configure CORS on your bucket. See https://github.com/alimanfoo/zarrita/issues/32#issuecomment-733243528
@joshmoore Thanks for the quick reply! This unfortunately did not solve the issue.
@AdemFr : did you ever have success here?
@joshmoore Unfortunately not, sorry. By now we also built a workaround so these len calls are not directly relevant to me right know, but could not find the real reason.