Modifying an array in a `MemoryStore` with multiprocessing does not work
Zarr version
3.0.8
Numcodecs version
0.16.1
Python Version
3.13.0
Operating System
macOS
Installation
uv
Description
I'm trying to modify data concurrently, making sure I'm only modifying one chunk at a time. This doesn't seem to work with multiprocessing though.
Steps to reproduce
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "zarr@git+https://github.com/zarr-developers/zarr-python.git@main",
# ]
# ///
#
# This script automatically imports the development branch of zarr to check for issues
import zarr
from multiprocessing import Pool
def set_to_zero(array, idx):
array[idx] = 0
if __name__ == "__main__":
array = zarr.ones(shape=(2,), chunks=(1,))
print(array[:])
# prints [1, 1]
args = [(array, 0), (array, 1)]
with Pool(1) as pool:
pool.starmap(set_to_zero, args)
print(array[:])
# prints [1, 1]; should print [0, 0]
for arg in args:
set_to_zero(*arg)
print(array[:])
# prints [0, 0]
Additional output
No response
What does "doesn't work" mean here?
I think this is because multiprocessing is creating a copy of the array (and underlying store) in each of the processes, so it's only a copy of the array that gets modified.
In that case, whats the recommended way to modify an in-memory array in parallel using multiprocessing?
Read the code snippet - I put inline comments showing expected/unexpected behaviour.
Okay, I was expecting to see a traceback or a failed assertion.
I gather you are using a MemoryStore, that isn't going to work. I'm actually surprised this runs at all -- would have thought this would fail when you pickle the MemoryStore but maybe that isn't happening here.
I gather you are using a MemoryStore, that isn't going to work
Why not?
Ah, this does indeed work fine for a LocalStore. I guess in that case although multiple copies of the store are created, each copy still writes to the same location (the same folder on disk), even after being copied.
So I guess the action for this issue, is work out how to modify an array in a MemoryStore in parallel, using multiprocessing, and add that to the documentation somewhere.
You got it! Let's update the issue title to clarify this is only an issue with the MemoryStore.
I would also say that this is more of a feature request than a bug -- you're looking for something that, as far as I know, Zarr has never supported.
I would also say that this is more of a feature request than a bug
👍 - I tried with zarr-python 2, and the same result.
Sharing memory across processes would require a special type of MemoryStore using a SharedMemory block, see https://docs.python.org/3/library/multiprocessing.shared_memory.html.
I can see how this could be useful. Very open to adding it.