Storage option that supports concurrent writes to a single local file?
I'm running into an issue where there doesn't seem to be a storage option in Zarr 3 that would let me write to different chunks in the same array concurrently and have them be contained in one local file. ZipStore doesn't support concurrent writes, a LocalStore becomes really unwieldy with many chunks, manually zipping such a LocalStore after writing takes really long and doubles needed storage space, and a bunch of other store types that supported concurrency were removed in the move from v2 to v3. I feel like I must be missing something here and there is a way to do this, because I thought concurrency was a focus in Zarr.
I'm running into an issue where there doesn't seem to be a storage option in Zarr 3 that would let me write to different chunks in the same array concurrently and have them be contained in one local file.
In principle, concurrently writing to separate regions of the same file is only possible in particular circumstances: when you are using no compression, or where you have a compression routine that has a guaranteed upper bound on the size of the compressed data. Zarr python doesn't currently handle these cases (but we would like to eventually).
Our current recommendation is to use sharding for your data, and write entire shards concurrently. But I think it would be nice to offer a function for taking data saved with a regular chunk scheme and re-packing it with sharding.
Thank you for the quick reply! Your explanation makes sense, and sharding sounds applicable to my use case.. How did Zarr v2 handle this problem of concurrent writing? I've never used it, but weren't there many store types that supported it?
Sharding was introduced in Zarr v3. The typical usage for Zarr V2 was 1 chunk : 1 file. Other than zip storage, I'm not aware of any support for packing multiple chunks in a single file in Zarr V2, so this wasn't a common problem.