zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

Data view /slice of zarr array without loading entire array

Open aliaksei-chareshneu opened this issue 3 years ago • 4 comments

Dear all,

Could you tell me please how do I get a data view of a zarr array? The key thing is performance.

From the docs, it looks like there is two options:

  • Use getitem via ":" notation (store is existing DirectoryStore, there is one group 'sgroup' and one 3D array 'sarr')
root = zarr.group(store=store)
arr = root.sgroup.sarr
slice = arr[1:3, 1:3, 1:3]
  • Use get_basic_selection
root = zarr.group(store=store)
arr = root.sgroup.sarr
slice = arr.get_basic_selection(slice(1, 3), slice(1,3), slice(1,3))

In general, what is the difference between them? Would both options indeed get slice without loading entire array? Are there better alternatives in terms of performance?

Best regards, Aliaksei

  • Value of zarr.__version__: 2.10.3
  • Value of numcodecs.__version__: 0.9.1
  • Version of Python interpreter: 3.8.2
  • Operating system (Linux/Windows/Mac): Windows 7
  • How Zarr was installed (e.g., "using pip into virtual environment", or "using conda"): using pip into virtual environment

aliaksei-chareshneu avatar Mar 05 '22 10:03 aliaksei-chareshneu

zarr-python should work hard not to load the entire array, but will actively load the individual chunks. If you want to defer even that, you might want to look into combing it with dask.

The recent release of 2.11 should also allow some slightly fancier indexing: https://zarr.dev/blog/release-2-11/

joshmoore avatar Mar 07 '22 19:03 joshmoore

You might be interested in TensorStore, which can do lazy indexing of Zarr arrays: https://github.com/google/tensorstore

Xarray also has it's own lazy indexing that works on top of Zarr (with or without Dask).

shoyer avatar Mar 07 '22 19:03 shoyer

zarr-python should work hard not to load the entire array, but will actively load the individual chunks. If you want to defer even that, you might want to look into combing it with dask.

The recent release of 2.11 should also allow some slightly fancier indexing: https://zarr.dev/blog/release-2-11/

@joshmoore, thank you! I had a look. But it seems that it is just syntactic sugar (like dropping 'vindex'), or there are performance benefits too?

aliaksei-chareshneu avatar Mar 08 '22 14:03 aliaksei-chareshneu

This is related to #843.

I would also note that it has been proposed to factor Xarray's lazy indexing classes into a standalone package (https://github.com/pydata/xarray/issues/5081).

rabernat avatar Mar 08 '22 14:03 rabernat

Adding the documentation label if we want to close this with an addition of pointers in the documentation of how this can be done with other libraries (and/or tutorial items). If someone feels there's a feature request looming, please say the word.

joshmoore avatar Dec 02 '22 16:12 joshmoore