zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

[WIP] V3 group tests

Open d-v-b opened this issue 10 months ago • 3 comments

Adds tests for the Group API in v3. In progress. I will remove the in progress label when it's ready for a final review, but I encourage people with an interest in the testing infrastructure to look at the early changes and weigh in if you see anything you don't like. It will be easier to course-correct before a lot of code is written.

To outline the strategy I'm using:

  • no classes (or at least, no inheritance)
  • parameterize / fixturize as much as possible
  • every method of the Group / AsyncGroup classes gets a test, that should parametrize over the arguments of that method

This is a deliberate departure from the strategy used in the v2 tests, where test classes are defined that inherit from a base test class, potentially overriding methods. I'm pretty sure we can use pytest fixtures to express the same functionality without needing the cognitive overhead of inheritance, but this is my personal judgment, and I'm open to other ideas here. The goal is just to have a test suite that's easy for developers to understand, which (in my experience) is not a criterion the v2 tests always meet.

I don't think I will fix failing tests here. That should probably done in separate PRs, once we agree on the shape of the testing infrastructure. edit: amendment to this: since test failures do slow down the creation of new tests, I'm going to apply some fixes to obvious bugs revealed in the process of adding tests. These fixes should be separable in the commit history, and on the basis of the files they affect.

depends on #1726

TODO:

  • [ ] Add unit tests and/or doctests in docstrings
  • [ ] Add docstrings and API docs for any new/modified user-facing classes and functions
  • [ ] New/modified features documented in docs/tutorial.rst
  • [ ] Changes documented in docs/release.rst
  • [ ] GitHub Actions have all passed
  • [ ] Test coverage is 100% (Codecov passes)

d-v-b avatar Apr 04 '24 20:04 d-v-b

Some of my group tests are running into bugs from the storage side. So I'm broadening the scope of this effort to include tests for the stores.

d-v-b avatar Apr 05 '24 11:04 d-v-b

Hello @d-v-b! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 62:29: E203 whitespace before ':'

Comment last updated at 2024-05-10 08:58:14 UTC

pep8speaks avatar Apr 05 '24 13:04 pep8speaks

Because of the intersection of the store design and implicit groups, we are a bit squeezed when it comes to performance / ergonomics for discovery routines like Group.open().

Consider this code:

# create a group
g = Group.open("group_path") 
g.open("X")

there are three possibilities to handle:

  • group_path/X/zarr.json exists. Then we attempt to parse it, and produce a group or array.
  • group_path/X is the key of an object in the store. Then group_path/X/zarr.json cannot exist, and so we should raise a KeyError
  • The key group_path/X/zarr.json does not exist. This has two sub-cases.
    • group_path/X/ is a prefix for the key of some Zarr metadata object, like group_path/X/Y/zarr.json. Then group_path/X should be treated as the key for an implicit group according to the spec.
    • group_path/X/ is not a prefix for any key in in the store, (e.g., because X is just a random string a user typed in by accident). Numerically, this is the most abundant situation (most stores only contain a small subset of all possible keys). The only way we can test for this is by calling store.list_dir and checking whether X appears in the list of prefixes returned by list_dir. I don't think calling list_dir for every Group.open() call is a good idea, but that's our only option with the current v3 spec and the current store design.

Please someone correct me if I'm wrong here, but it looks like if we support implicit groups, and we don't want Group.open(blabalabla) to nearly always succeed, then we have to accept terrible performance (due to the need to run a directory listing). This entails that stores that don't support directory listing will appear to contain an infinite number of implicit groups. Not great. We should really remove implicit groups from the spec.

d-v-b avatar Apr 16 '24 09:04 d-v-b

I think this is ready to go

d-v-b avatar May 10 '24 09:05 d-v-b

one large, but minor (or petty) change in the latest version of this effort: I renamed test_storage.py to test_store.py, in order to be consistent with the naming: we have zarr.store, not zarr.storage.

d-v-b avatar May 15 '24 15:05 d-v-b