zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

Storage option that supports concurrent writes to a single local file?

Open 5xx7xx opened this issue 2 months ago • 3 comments

I'm running into an issue where there doesn't seem to be a storage option in Zarr 3 that would let me write to different chunks in the same array concurrently and have them be contained in one local file. ZipStore doesn't support concurrent writes, a LocalStore becomes really unwieldy with many chunks, manually zipping such a LocalStore after writing takes really long and doubles needed storage space, and a bunch of other store types that supported concurrency were removed in the move from v2 to v3. I feel like I must be missing something here and there is a way to do this, because I thought concurrency was a focus in Zarr.

5xx7xx avatar Oct 15 '25 08:10 5xx7xx

I'm running into an issue where there doesn't seem to be a storage option in Zarr 3 that would let me write to different chunks in the same array concurrently and have them be contained in one local file.

In principle, concurrently writing to separate regions of the same file is only possible in particular circumstances: when you are using no compression, or where you have a compression routine that has a guaranteed upper bound on the size of the compressed data. Zarr python doesn't currently handle these cases (but we would like to eventually).

Our current recommendation is to use sharding for your data, and write entire shards concurrently. But I think it would be nice to offer a function for taking data saved with a regular chunk scheme and re-packing it with sharding.

d-v-b avatar Oct 15 '25 09:10 d-v-b

Thank you for the quick reply! Your explanation makes sense, and sharding sounds applicable to my use case.. How did Zarr v2 handle this problem of concurrent writing? I've never used it, but weren't there many store types that supported it?

5xx7xx avatar Oct 15 '25 09:10 5xx7xx

Sharding was introduced in Zarr v3. The typical usage for Zarr V2 was 1 chunk : 1 file. Other than zip storage, I'm not aware of any support for packing multiple chunks in a single file in Zarr V2, so this wasn't a common problem.

d-v-b avatar Oct 15 '25 09:10 d-v-b