zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

Modifying an array in a `MemoryStore` with multiprocessing does not work

Open dstansby opened this issue 7 months ago • 9 comments

Zarr version

3.0.8

Numcodecs version

0.16.1

Python Version

3.13.0

Operating System

macOS

Installation

uv

Description

I'm trying to modify data concurrently, making sure I'm only modifying one chunk at a time. This doesn't seem to work with multiprocessing though.

Steps to reproduce

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "zarr@git+https://github.com/zarr-developers/zarr-python.git@main",
# ]
# ///
#
# This script automatically imports the development branch of zarr to check for issues

import zarr
from multiprocessing import Pool


def set_to_zero(array, idx):
    array[idx] = 0


if __name__ == "__main__":
    array = zarr.ones(shape=(2,), chunks=(1,))
    print(array[:])
    # prints [1, 1]

    args = [(array, 0), (array, 1)]

    with Pool(1) as pool:
        pool.starmap(set_to_zero, args)

    print(array[:])
    # prints [1, 1]; should print [0, 0]

    for arg in args:
        set_to_zero(*arg)

    print(array[:])
    # prints [0, 0]

Additional output

No response

dstansby avatar Jun 09 '25 14:06 dstansby

What does "doesn't work" mean here?

jhamman avatar Jun 09 '25 15:06 jhamman

I think this is because multiprocessing is creating a copy of the array (and underlying store) in each of the processes, so it's only a copy of the array that gets modified.

In that case, whats the recommended way to modify an in-memory array in parallel using multiprocessing?

dstansby avatar Jun 09 '25 15:06 dstansby

Read the code snippet - I put inline comments showing expected/unexpected behaviour.

dstansby avatar Jun 09 '25 15:06 dstansby

Okay, I was expecting to see a traceback or a failed assertion.

I gather you are using a MemoryStore, that isn't going to work. I'm actually surprised this runs at all -- would have thought this would fail when you pickle the MemoryStore but maybe that isn't happening here.

jhamman avatar Jun 09 '25 15:06 jhamman

I gather you are using a MemoryStore, that isn't going to work

Why not?

dstansby avatar Jun 09 '25 15:06 dstansby

Ah, this does indeed work fine for a LocalStore. I guess in that case although multiple copies of the store are created, each copy still writes to the same location (the same folder on disk), even after being copied.

So I guess the action for this issue, is work out how to modify an array in a MemoryStore in parallel, using multiprocessing, and add that to the documentation somewhere.

dstansby avatar Jun 09 '25 15:06 dstansby

You got it! Let's update the issue title to clarify this is only an issue with the MemoryStore.

I would also say that this is more of a feature request than a bug -- you're looking for something that, as far as I know, Zarr has never supported.

jhamman avatar Jun 09 '25 15:06 jhamman

I would also say that this is more of a feature request than a bug

👍 - I tried with zarr-python 2, and the same result.

dstansby avatar Jun 09 '25 15:06 dstansby

Sharing memory across processes would require a special type of MemoryStore using a SharedMemory block, see https://docs.python.org/3/library/multiprocessing.shared_memory.html.

I can see how this could be useful. Very open to adding it.

rabernat avatar Jun 09 '25 15:06 rabernat