Possible feature request on saving array mechanism
Hello all,
As far as I understand, current zarr-python implementation is overwriting all the chunks by default when you have an array that is already saved, had some changes and will be saved again. (correct me if I am wrong but here what I am looking it: https://github.com/zarr-developers/zarr-python/blob/505810c44108328ec5732ad8460057f016994fd3/zarr/convenience.py#L170 for example). In some cases(like image acquisition softwares that are trying to save more chunks as data arrives and continues to write chunks over hours/days) it can be wasteful overwrite all chunks especially if the only the new chunks are different chunks.
Less wordy explanation of the concern can be:
- imagine you have an array on disk with 1000 chunks.
- you want to append let's say 1000 more chunks of data to the array.
- you want zarr api to realized first 1000 chunks will be identical anyway and not spend time overwrite it and directly only add new chunks.
Here at opensci2022 meeting, I have been discussing this with @jakirkham and he suggested one can resize the array first and fill only the new chunks with newly available values/frames. I think it is a valid way to address the concern. I like to discuss if we can possibly implement this internally and do it by default if possible. It may or may not change the existing public API(happy to discuss here). A few implementation ideas:
- there is a
require_datasetendpoint: https://github.com/zarr-developers/zarr-python/blob/ce129a560d48854aee533bb2699a3f28b396bc22/zarr/hierarchy.py#L997 , maybe we can implement a similar function that isrequire_chunksand does the check internally and we can call such function in thesave_arrayendpoint? - there is already an append API here: https://github.com/zarr-developers/zarr-python/blob/43266eec01561186b1b32e2fe3b12247130a0f0d/zarr/core.py#L2507 but I am not sure if this would work as I explain above at all the times? I understood it works per axis at a time.
Any ideas/comments/discussions welcome!
Thanks, @AhmetCanSolak. Cross-linking here as during the community meeting: https://github.com/zarr-developers/zarr-python/issues/1017