Error trying to assign slice of a Zarr array to another Zarr array
Zarr version
3.1.6.dev5+gee0e69a74
Numcodecs version
0.6.15
Python Version
3.12.8
Operating System
macOS
Installation
From source
Description
Trying to assign to a slice of a Zarr array with another Zarr array fails. Assigning a float or a NumPy array to the same slice works fine.
Steps to reproduce
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "zarr@git+https://github.com/zarr-developers/zarr-python.git@main",
# ]
# ///
#
# This script automatically imports the development branch of zarr to check for issues
import zarr
arr1 = zarr.zeros((5, 5))
arr2 = zarr.ones((1, 1))
slc = (slice(1, 2), slice(2, 3))
arr1[slc] = arr2
Additional output
Traceback (most recent call last):
File "/Users/dstansby/software/zarr/zarr-python/test.py", line 7, in <module>
arr1[slc] = arr2
~~~~^^^^^
File "/Users/dstansby/software/zarr/zarr-python/src/zarr/core/array.py", line 2975, in __setitem__
self.set_orthogonal_selection(pure_selection, value, fields=fields)
File "/Users/dstansby/software/zarr/zarr-python/src/zarr/core/array.py", line 3458, in set_orthogonal_selection
return sync(
^^^^^
File "/Users/dstansby/software/zarr/zarr-python/src/zarr/core/sync.py", line 159, in sync
raise return_result
File "/Users/dstansby/software/zarr/zarr-python/src/zarr/core/sync.py", line 119, in _runner
return await coro
^^^^^^^^^^
File "/Users/dstansby/software/zarr/zarr-python/src/zarr/core/array.py", line 1736, in _set_selection
await self.codec_pipeline.write(
File "/Users/dstansby/software/zarr/zarr-python/src/zarr/core/codec_pipeline.py", line 488, in write
await concurrent_map(
File "/Users/dstansby/software/zarr/zarr-python/src/zarr/core/common.py", line 116, in concurrent_map
return await asyncio.gather(*[asyncio.ensure_future(run(item)) for item in items])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dstansby/software/zarr/zarr-python/src/zarr/core/common.py", line 114, in run
return await func(*item)
^^^^^^^^^^^^^^^^^
File "/Users/dstansby/software/zarr/zarr-python/src/zarr/core/codec_pipeline.py", line 392, in write_batch
self._merge_chunk_array(
File "/Users/dstansby/software/zarr/zarr-python/src/zarr/core/codec_pipeline.py", line 325, in _merge_chunk_array
chunk_value = value[out_selection]
~~~~~^^^^^^^^^^^^^^^
File "/Users/dstansby/software/zarr/zarr-python/src/zarr/core/buffer/cpu.py", line 186, in __getitem__
return self.__class__(np.asanyarray(self._data.__getitem__(key)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dstansby/software/zarr/zarr-python/src/zarr/core/array.py", line 2868, in __getitem__
return self.get_orthogonal_selection(pure_selection, fields=fields)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dstansby/software/zarr/zarr-python/src/zarr/core/array.py", line 3339, in get_orthogonal_selection
return sync(
^^^^^
File "/Users/dstansby/software/zarr/zarr-python/src/zarr/core/sync.py", line 146, in sync
raise SyncError("Calling sync() from within a running loop")
zarr.core.sync.SyncError: Calling sync() from within a running loop
sys:1: RuntimeWarning: coroutine 'AsyncArray._get_selection' was never awaited
this is an interesting bug! I wonder if the correct fix is basically to implement lazy slicing
How would lazy slicing work to solve this?
From my limited dive in to the source, the issue is set_orthogonal_selection spins up an event loop, inside which the data is retrieved from the array being assigned from. If this array is a Zarr array, get_orthogonal_selection is called, which attempts to spin up it's own event loop, which isn't allowed (because it's inside an already running event loop).
Casting the array being assigned from to a numpy array before entering the event loop fixes this, but I haven't thought through if there are any other implications of this 'fix'?
diff --git a/src/zarr/core/array.py b/src/zarr/core/array.py
index 6b20ee95..be2f9832 100644
--- a/src/zarr/core/array.py
+++ b/src/zarr/core/array.py
@@ -3455,8 +3455,9 @@ class Array(Generic[T_ArrayMetadata]):
if prototype is None:
prototype = default_buffer_prototype()
indexer = OrthogonalIndexer(selection, self.shape, self.metadata.chunk_grid)
+ data = np.asarray(value)
return sync(
- self.async_array._set_selection(indexer, value, fields=fields, prototype=prototype)
+ self.async_array._set_selection(indexer, data, fields=fields, prototype=prototype)
)
def get_mask_selection(
How would lazy slicing work to solve this?
To make lazy slicing work, we would need to model a zarr array as essentially 3 things:
- a metadata document
- a storage backend
- a request for a subset of the array, e.g. a slice
Slicing a Zarr array would just be changing the last datum, which doesn't require any IO (and therefore doesn't kick off async tasks). With this representation, we would could defer the actual IO, which would prevent the event loop problems you had here.
collecting everything in memory as a numpy array works but it's inefficient -- we lose the chunked representation of the data, which we could use for efficient IO
Is the chunked data used for efficient I/O at the moment? From the tracebacks above it looks like the chunk data is being decoded anyway further down the stack, so perhaps we wouldn't lose anything by just decoding and bringing into memory earlier.
Is the chunked data used for efficient I/O at the moment?
certainly not in the case here, as we need this to not error before we can start making performance optimizations! As a short-term fix I think collecting data into memory as a numpy array (but maybe this should be an ndbuffer, to ensure we can work with GPUs) makes sense for now.