Zarr over https returns an empty string in `keys()`
Zarr version
3.1.3
Numcodecs version
0.15.1
Python Version
3.12.9
Operating System
Mac
Installation
uv pip install zarr
Description
The following code:
import zarr, numcodecs
z = zarr.open('https://data.ecmwf.int/anemoi-datasets/era5-o96-1979-2023-6h-v8.zarr', mode='r')
print(list(z.keys()))
will show that one of the keys is the empty string ''. This is likely because the directory listing links to itself.
See https://data.ecmwf.int/anemoi-datasets/era5-o96-1979-2023-6h-v8.zarr
Steps to reproduce
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "zarr@git+https://github.com/zarr-developers/zarr-python.git@main",
# ]
# ///
#
# This script automatically imports the development branch of zarr to check for issues
import zarr
# your reproducer code
# zarr.print_debug_info()
Additional output
No response
directory listing over http is not to my knowledge standardized, so we are relying on fsspec's heuristics, which iirc involve assuming that a request against "path/" follows the static file server contention of returning a list of links to path/a, path/b, etc. I'm not sure if this is something zarr or fsspec should fix. On our side, we could develop a special storage backend for http storage that deals with this.
I understand that. I just wanted to report it. I'll exclude the empty string in my code.
we should definitely not be conveying the existence of keys that do not exist, so as a short-term fix we should probably sanitize the results of directory listing
Please note that zarr 2.18.7 does not return that empty string.
But, zarr2 did return .zgroup. I expect keys() to return all arrays and sub-groups.
one step forward, one step back