zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

forward write_empty_chunks kwarg in group.require_dataset

Open d-v-b opened this issue 2 years ago • 9 comments

Currently when using a Group to get an existing Array via Group.require_dataset, there's no way to control the "write_empty_chunksness" of the array (since Array.write_empty_chunks is not part of the array metadata). This means that you cannot call group.require_dataset('foo', shape=10, dtype='i4', write_empty_chunks=False) and get an array that has the desired .write_empty_chunks property (instead the write_empty_chunks kwarg is ignored).

A few other array properties are similar (synchronizer, cache_metadata, and cache_attrs), and in main these keyword arguments are extracted from the **kwargs argument to Group.require_dataset before being passed to the Array constructor. This PR expands this behavior to include write_empty_chunks, thereby enabling the control of the write_empty_chunks property of the arrays returned by Group.require_dataset.

I also added a lot of type annotations.

TODO:

  • [x] Add unit tests and/or doctests in docstrings
  • [x] Add docstrings and API docs for any new/modified user-facing classes and functions
  • [ ] New/modified features documented in docs/tutorial.rst
  • [ ] Changes documented in docs/release.rst
  • [ ] GitHub Actions have all passed
  • [ ] Test coverage is 100% (Codecov passes)

d-v-b avatar Jun 22 '22 19:06 d-v-b

looks like CI is failing for python 3.7 due to some type annotation stuff... do we need to support 3.7 still?

d-v-b avatar Jun 22 '22 19:06 d-v-b

looks like CI is failing for python 3.7 due to some type annotation stuff... do we need to support 3.7 still?

  • Happy to hear opinions but depending on when you are targeting this for, I'd be inclined to hold off on another the drop page.
  • https://github.com/python/typing/issues/707 suggests adding typing_extensions as a dependency.
  • Alternatively, we hold off on the use of Literal.

joshmoore avatar Jun 23 '22 06:06 joshmoore

Codecov Report

Merging #1051 (89f78a7) into main (5c602cb) will increase coverage by 0.00%. The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #1051   +/-   ##
=======================================
  Coverage   99.94%   99.94%           
=======================================
  Files          34       34           
  Lines       13842    13865   +23     
=======================================
+ Hits        13835    13858   +23     
  Misses          7        7           
Impacted Files Coverage Δ
zarr/creation.py 100.00% <ø> (ø)
zarr/_storage/store.py 100.00% <100.00%> (ø)
zarr/_storage/v3.py 100.00% <100.00%> (ø)
zarr/convenience.py 100.00% <100.00%> (ø)
zarr/core.py 100.00% <100.00%> (ø)
zarr/hierarchy.py 99.79% <100.00%> (+<0.01%) :arrow_up:
zarr/meta.py 100.00% <100.00%> (ø)
zarr/storage.py 100.00% <100.00%> (ø)
zarr/tests/test_hierarchy.py 100.00% <100.00%> (ø)
zarr/util.py 100.00% <100.00%> (ø)

codecov[bot] avatar Jun 23 '22 12:06 codecov[bot]

Hello @d-v-b! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2022-07-19 17:27:41 UTC

pep8speaks avatar Jun 27 '22 01:06 pep8speaks

mypy is almost happy. A few things that I would appreciate input on:

  • The _ensure_store methods of BaseStore and StoreV3 in _storage/store.py can return None, which contradicts the docstrings for those methods -- as advertised, these methods should return valid store instances. Should they instead return some default store when the input is None? This is causing some mypy issues that I can get around with calls to cast, but it struck me as odd that the docstring mismatches the implementation here. cc @grlee77
  • there are a few places where we call __init__ directly (namely setstate methods), and mypy hates that. Is there an alternative to calling __init__ directly, or should I just blind mypy to that code?

d-v-b avatar Jun 27 '22 13:06 d-v-b

_ensure_store methods of BaseStore and StoreV3 in _storage/store.py can return None

My guess is that they should throw on None.

https://github.com/zarr-developers/zarr-python/blob/main/zarr/storage.py#L787

Extracting the entire contents of __init__ out to an __initialize(). In that case, I do wonder what happens to the lock though....

joshmoore avatar Jul 04 '22 14:07 joshmoore

@grlee77 can you shed some light on whether the _ensure_store methods defined in _storage/store.py should handle None? Returning None is contrary to the docstrings, but there's a test for this exact behavior here. If I remove the None-handling logic from _ensure_store and the test for this behavior, no other tests fail...

d-v-b avatar Jul 05 '22 21:07 d-v-b

That is how _ensure_store was original implemented in #612, but the test case wasn't there at that time. Most likely, I added the test case later to complete test coverage, but probably should have just removed that case instead. Perhaps it was being used at some point in an earlier draft, but does not currently seem to be needed. I will make a PR to remove it.

grlee77 avatar Jul 06 '22 15:07 grlee77

I was thinking merging @grlee77's PR would clear up your mypy woes, @d-v-b, but there look to be a few more.

joshmoore avatar Jul 11 '22 12:07 joshmoore