zarr-python
zarr-python copied to clipboard
Data view /slice of zarr array without loading entire array
Dear all,
Could you tell me please how do I get a data view of a zarr array? The key thing is performance.
From the docs, it looks like there is two options:
- Use
getitemvia ":" notation (store is existing DirectoryStore, there is one group 'sgroup' and one 3D array 'sarr')
root = zarr.group(store=store)
arr = root.sgroup.sarr
slice = arr[1:3, 1:3, 1:3]
- Use
get_basic_selection
root = zarr.group(store=store)
arr = root.sgroup.sarr
slice = arr.get_basic_selection(slice(1, 3), slice(1,3), slice(1,3))
In general, what is the difference between them? Would both options indeed get slice without loading entire array? Are there better alternatives in terms of performance?
Best regards, Aliaksei
- Value of
zarr.__version__: 2.10.3 - Value of
numcodecs.__version__: 0.9.1 - Version of Python interpreter: 3.8.2
- Operating system (Linux/Windows/Mac): Windows 7
- How Zarr was installed (e.g., "using pip into virtual environment", or "using conda"): using pip into virtual environment
zarr-python should work hard not to load the entire array, but will actively load the individual chunks. If you want to defer even that, you might want to look into combing it with dask.
The recent release of 2.11 should also allow some slightly fancier indexing: https://zarr.dev/blog/release-2-11/
You might be interested in TensorStore, which can do lazy indexing of Zarr arrays: https://github.com/google/tensorstore
Xarray also has it's own lazy indexing that works on top of Zarr (with or without Dask).
zarr-python should work hard not to load the entire array, but will actively load the individual chunks. If you want to defer even that, you might want to look into combing it with dask.
The recent release of 2.11 should also allow some slightly fancier indexing: https://zarr.dev/blog/release-2-11/
@joshmoore, thank you! I had a look. But it seems that it is just syntactic sugar (like dropping 'vindex'), or there are performance benefits too?
This is related to #843.
I would also note that it has been proposed to factor Xarray's lazy indexing classes into a standalone package (https://github.com/pydata/xarray/issues/5081).
Adding the documentation label if we want to close this with an addition of pointers in the documentation of how this can be done with other libraries (and/or tutorial items). If someone feels there's a feature request looming, please say the word.