zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

More efficient `Group.arrays()` for v3 stores

Open dcherian opened this issue 1 year ago • 0 comments

Zarr version

v2.16.1

Numcodecs version

n/a

Python Version

3.11

Operating System

mac

Installation

conda

Description

Xarray uses .arrays() to iterate over the arrays in a group. https://github.com/pydata/xarray/blob/4a0bb2eb80538806468233d11bc5a4c06ffb417e/xarray/backends/zarr.py#L539

The implementation is a serial for loop that requests .array.json and constructs the Zarr array to return: https://github.com/zarr-developers/zarr-python/blob/6fe553df925c224fcc0a12ecdd074997ce9e56f7/zarr/hierarchy.py#L682-L686

It be nice to be more efficient here when opening a store with O(100) variables on cloud object storage.

Here's one idea that comes to mind. This code already knows the json files it needs (the listdir call). That means Zarr could request all the json docs at once using store.getitems, and use those to construct the array objects.

I don't immediately see how to enable this though. Perhaps there are other solutions.

Steps to reproduce

n/a

Additional output

No response

dcherian avatar Mar 20 '24 15:03 dcherian