zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

Accessing zarr group inside jupyter yields zero values and len(group) = 0

Open AdemFr opened this issue 4 years ago • 4 comments

Minimal, reproducible code sample, a copy-pastable example if possible

import numpy as np
import zarr

store = zarr.open_group("gs://my_bucket/remote_group.zarr", mode="r")
for idx in ("group_1", "group_2"):
    nested_group = store[idx]["nested"]["group"]
    my_array = nested_group["array_in_group"]
    print("Len: ", len(nested_group), "; Unique Values: ", np.unique(np.array(my_array)))

# Output:
# Len:  0 ; Unique Values:  [0]
# Len:  0 ; Unique Values:  [0]
# Len:  0 ; Unique Values:  [0]

When copying the whole zarr store with gsutil cp gs://my_bucket/remote_group.zarr . and running the same code again while only changing the path, I get the desired output with the correct values:

import numpy as np
import zarr

store = zarr.open_group("./remote_group.zarr", mode="r")  # Only change
for idx in ("group_1", "group_2"):
    nested_group = store[idx]["nested"]["group"]
    my_array = nested_group["array_in_group"]
    print("Len: ", len(nested_group), "; Unique Values: ", np.unique(np.array(my_array)))

# Output:
# Len:  1 ; Unique Values:  [0 7]
# Len:  1 ; Unique Values:  [0]
# Len:  1 ; Unique Values:  [ 0  1  2  3  6  7  8  9 10]

Problem description

Access to the same zarr group on a remote google cloud storage bucket give zeros as output when run in jupyter lab or jupyter notebook. I tested this in a number of different cases, for local access to remote storage and from a remote kubernetes pod in the same gco project with access to the bucket.

Cases tested:

  1. Access remote zarr from local jupyter / jupyter lab --> Doesn’t work
  2. Access remote zarr from remote jupyter / jupyter lab (kubernetes pod) --> Doesn’t work
  3. Access local zarr (copied with gsutil) from local jupyter --> Works
  4. Access remote zarr from a remote run (kubernetes pod) inside python my_zarr_access_file.py --> Works
  5. Access remote zarr from local run (same venv as jupyter) inside python my_zarr_access_file.py --> Works

Version and installation information

Please provide the following:

  • Value of zarr.__version__: Tested '2.7.0' (kubernetes pod) and '2.6.1' (local venv)
  • Value of numcodecs.__version__: Tested '0.7.3' (kubernetes pod and local venv)
  • Version of Python interpreter: Tested 3.8.3 (kubernetes pod) and 3.7.6 (local venv)
  • Operating system (Linux/Windows/Mac) Linux (kubernetes pod) Mac (local venv)
  • How Zarr was installed: "pip into venv" (local) / "pip inside conda env (kubernetes pod)"

AdemFr avatar Mar 30 '21 09:03 AdemFr

@AdemFr : you may need to configure CORS on your bucket. See https://github.com/alimanfoo/zarrita/issues/32#issuecomment-733243528

joshmoore avatar Mar 30 '21 10:03 joshmoore

@joshmoore Thanks for the quick reply! This unfortunately did not solve the issue.

AdemFr avatar Mar 30 '21 10:03 AdemFr

@AdemFr : did you ever have success here?

joshmoore avatar Sep 22 '21 13:09 joshmoore

@joshmoore Unfortunately not, sorry. By now we also built a workaround so these len calls are not directly relevant to me right know, but could not find the real reason.

AdemFr avatar Sep 22 '21 13:09 AdemFr