zarr-python
zarr-python copied to clipboard
write behavior for empty chunks
In v2, at array access time it is possible to set whether empty chunks (defined as chunks that are entirely fill_value
) should be written to storage or skipped. This is an extremely useful feature for high-latency storage backends, or in any context where too many objects in storage is burdensome.
We don't support this in v3 yet, but we should. How should we do it? I will throw out a few options in order of practicality:
- emulate v2: provide a keyword argument like
write_empty_chunks
when accessing an array. All chunk writes from that array will be affected. - put the
write_empty_chunks
setting in a global config. All chunk writes from all arrays in a session will be affected by the config parameter. - design an API for array IO wherein IO is wrapped in a context that can be parametrized, e.g. with a context manager, and one of those parameters is the write_empty_chunks-ness of the write transaction. Highly speculative.
The first option seems pretty expedient, and I don't think we had a lot of problems with this approach in v2. The only drawback is that if people want the same array to exhibit conditional write_empty_chunks behavior, then they might need something like the second approach, which has its own drawbacks IMO (i'm not a big fan of mutable global state).
I would propose that we emulate v2 for now (i.e., make write_empty_chunks
a keyword argument to array access) and note any friction this causes, and consider ways to alleviate that in a subsequent design refresh if the friction is severe.
cc @constantinpape