zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

Failure to list keys on remote HTTP store

Open ianhi opened this issue 3 months ago • 2 comments

Zarr version

main

Numcodecs version

0.16.3

Python Version

3.13

Operating System

mac

Installation

pep-723

Description

Opening a remote zarr store over https can silently fails to list keys

Steps to reproduce

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "zarr@git+https://github.com/zarr-developers/zarr-python.git@main",
#   "fsspec",
#   "requests",
#   "aiohttp",
#   "ome-zarr"
# ]
# ///
"""
Minimal reproducer for Zarr remote vs local group listing issue.

This demonstrates that zarr.Group.keys() returns empty list for remote
stores (FsspecStore over HTTP) but works correctly for local stores,
even though direct access (group['0']) works in both cases.

"""

from pathlib import Path

import zarr

url = "https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.5/idr0062A/6001240_labels.zarr"
download_dir = Path("data")
local_path = download_dir / "6001240_labels.zarr"

if not local_path.exists():
    import ome_zarr.utils

    print(f"Downloading {url}...")
    download_dir.mkdir(parents=True, exist_ok=True)
    ome_zarr.utils.download(url, str(download_dir))
    print("Download complete!")
print("\n\n\n")

# actual reproducer

zarr.print_debug_info()
# Test Remote
print("\n1. REMOTE store (FsspecStore over HTTP)")
remote_group = zarr.open_group(url, mode="r")
remote_keys = list(remote_group.keys())
print(f"   keys() → {remote_keys}")
print(f"   Direct access group['0'] → {type(remote_group['0']).__name__}")

# Test Local
print("\n2. LOCAL store (LocalStore)")
print(f"   Path: {local_path}")

local_group = zarr.open_group(str(local_path), mode="r")
local_keys = list(local_group.keys())
print(f"   keys() → {local_keys}")
print(f"   Direct access group['0'] → {type(local_group['0']).__name__}")

# Show the bug
print("\n" + "=" * 80)
print("BUG: Remote keys() returns empty but direct access works!")
print("=" * 80)
print(f"Remote keys(): {remote_keys} (WRONG - should match local)")
print(f"Local keys():  {local_keys}")
print(f"\nBoth can access group['0']: ✓")
print("\nThis breaks xarray when iterating groups for DataTree.")

Additional output



1. REMOTE store (FsspecStore over HTTP)
   keys() → []
   Direct access group['0'] → Array

2. LOCAL store (LocalStore)
   Path: data/6001240_labels.zarr
   keys() → ['0', '1', 'labels', '2']
   Direct access group['0'] → Array

================================================================================
BUG: Remote keys() returns empty but direct access works!
================================================================================
Remote keys(): [] (WRONG - should match local)
Local keys():  ['0', '1', 'labels', '2']

Both can access group['0']: ✓

This breaks xarray when iterating groups for DataTree.

ianhi avatar Sep 30 '25 18:09 ianhi

I think a big part of the issue here is that that URL points to an S3 store, not a https store, but it's getting called an http store by url_to_fs from fsspec.

Two aciton iitems i think:

  1. if we end up in this state zarr should fail proeprly instead of just returning nothing
  2. better detection of this as s3?

ianhi avatar Sep 30 '25 18:09 ianhi

in general http-backed storage is not guaranteed to support directory listing. But for ome-zarr, this is not so important, because the multiscales attribute of an ome-zarr mulitscale group contains the names of all the scale levels, so you don't actually need directory listing to traverse the zarr nodes relevant to the format.

d-v-b avatar Oct 01 '25 07:10 d-v-b