zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

Feat/write empty chunks

Open d-v-b opened this issue 4 months ago • 6 comments

This PR adds a boolean array.write_empty_chunks value to the global config, and uses this value to control whether chunks that are "empty", i.e. filled with values equivalent to the array's fill value, are written to storage.

In zarr-python 2.x, write_empty_chunks was a property of an Array that users specified when creating the Array object. This had pros and cons which I'm happy to discuss if people are interested, but the tl;dr is that the cons of that approach are driving my decision in this PR to make write_empty_chunks a global runtime property accessible via the config API.

Usage looks something like this (donfig experts please correct me if there's a better way):

with config.set({"array.write_empty_chunks": write_empty_chunks}):
    arr[:] = fill_value

If people hate this, then we can definitely change this API. I'm very open to discussion here.

Also worth noting:

Our check for whether a chunk is equal to the fill value is pretty inefficient -- it's allocating a new array for every check invocation. This can definitely be made more efficient, in a stupid way by caching an all-fill-value chunk on the array instance and using that for the comparison, or a smarter way by doing the (chunk, fill_value) comparison without allocating a new array. But I think this is an effort for a separate PR.

closes #2409

TODO:

  • [ ] Add unit tests and/or doctests in docstrings
  • [ ] Add docstrings and API docs for any new/modified user-facing classes and functions
  • [ ] New/modified features documented in docs/tutorial.rst
  • [ ] Changes documented in docs/release.rst
  • [ ] GitHub Actions have all passed
  • [ ] Test coverage is 100% (Codecov passes)

d-v-b avatar Oct 22 '24 10:10 d-v-b