zarr-python
zarr-python copied to clipboard
[WIP] V3 group tests
Adds tests for the Group API in v3. In progress. I will remove the in progress label when it's ready for a final review, but I encourage people with an interest in the testing infrastructure to look at the early changes and weigh in if you see anything you don't like. It will be easier to course-correct before a lot of code is written.
To outline the strategy I'm using:
- no classes (or at least, no inheritance)
- parameterize / fixturize as much as possible
- every method of the Group / AsyncGroup classes gets a test, that should parametrize over the arguments of that method
This is a deliberate departure from the strategy used in the v2 tests, where test classes are defined that inherit from a base test class, potentially overriding methods. I'm pretty sure we can use pytest fixtures to express the same functionality without needing the cognitive overhead of inheritance, but this is my personal judgment, and I'm open to other ideas here. The goal is just to have a test suite that's easy for developers to understand, which (in my experience) is not a criterion the v2 tests always meet.
I don't think I will fix failing tests here. That should probably done in separate PRs, once we agree on the shape of the testing infrastructure. edit: amendment to this: since test failures do slow down the creation of new tests, I'm going to apply some fixes to obvious bugs revealed in the process of adding tests. These fixes should be separable in the commit history, and on the basis of the files they affect.
depends on #1726
TODO:
- [ ] Add unit tests and/or doctests in docstrings
- [ ] Add docstrings and API docs for any new/modified user-facing classes and functions
- [ ] New/modified features documented in docs/tutorial.rst
- [ ] Changes documented in docs/release.rst
- [ ] GitHub Actions have all passed
- [ ] Test coverage is 100% (Codecov passes)
Some of my group tests are running into bugs from the storage side. So I'm broadening the scope of this effort to include tests for the stores.
Hello @d-v-b! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
- In the file
tests/v3/test_storage.py
:
Line 62:29: E203 whitespace before ':'
Comment last updated at 2024-05-10 08:58:14 UTC
Because of the intersection of the store design and implicit groups, we are a bit squeezed when it comes to performance / ergonomics for discovery routines like Group.open()
.
Consider this code:
# create a group
g = Group.open("group_path")
g.open("X")
there are three possibilities to handle:
-
group_path/X/zarr.json
exists. Then we attempt to parse it, and produce a group or array. -
group_path/X
is the key of an object in the store. Thengroup_path/X/zarr.json
cannot exist, and so we should raise aKeyError
- The key
group_path/X/zarr.json
does not exist. This has two sub-cases. -
-
group_path/X/
is a prefix for the key of some Zarr metadata object, likegroup_path/X/Y/zarr.json
. Thengroup_path/X
should be treated as the key for an implicit group according to the spec.
-
-
-
group_path/X/
is not a prefix for any key in in the store, (e.g., becauseX
is just a random string a user typed in by accident). Numerically, this is the most abundant situation (most stores only contain a small subset of all possible keys). The only way we can test for this is by callingstore.list_dir
and checking whetherX
appears in the list of prefixes returned bylist_dir
. I don't think callinglist_dir
for everyGroup.open()
call is a good idea, but that's our only option with the current v3 spec and the current store design.
-
Please someone correct me if I'm wrong here, but it looks like if we support implicit groups, and we don't want Group.open(blabalabla)
to nearly always succeed, then we have to accept terrible performance (due to the need to run a directory listing). This entails that stores that don't support directory listing will appear to contain an infinite number of implicit groups. Not great. We should really remove implicit groups from the spec.
I think this is ready to go
one large, but minor (or petty) change in the latest version of this effort: I renamed test_storage.py
to test_store.py
, in order to be consistent with the naming: we have zarr.store
, not zarr.storage
.