imitation
imitation copied to clipboard
Introduce hypothesis testing to BC tests
Description
This PR introduces hypothesis tests for BC and does some general cleanup such as introducing a GIVEN, WHEN, THEN structure where appropriate and renaming tests to make it clear what they test.
- [ ] ~~Make sure the hypothesis cache is preserved~~ (no decent way to do this, not even the hypothesis ppl do it in their CI)
- [x] Move test utilities to a different module
- [x] Investigate if
_DataLoaderThatFailsOnNthIter
really behaves as intended (it does) - [x] Stop to parameterize the vecenv size where it makes no sense. (decided it makes sense everywhere)
Testing
Na, I did not test the tests ...
Codecov Report
Merging #569 (5650240) into master (288c25a) will increase coverage by
0.00%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## master #569 +/- ##
=======================================
Coverage 97.47% 97.48%
=======================================
Files 83 85 +2
Lines 8082 8105 +23
=======================================
+ Hits 7878 7901 +23
Misses 204 204
Impacted Files | Coverage Δ | |
---|---|---|
src/imitation/testing/expert_trajectories.py | 100.00% <100.00%> (ø) |
|
tests/algorithms/conftest.py | 100.00% <100.00%> (ø) |
|
tests/algorithms/test_bc.py | 100.00% <100.00%> (ø) |
|
tests/algorithms/test_dagger.py | 100.00% <100.00%> (ø) |
|
tests/conftest.py | 100.00% <100.00%> (ø) |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
Also one test is failing on Mac OS (maybe flaky? I don't see anything Mac OS specific). And type checking failing --- I think pytype may be confused by the presence of two conftest.py
files, adding __init__.py
to tests/algorithms
may help with this.
Also one test is failing on Mac OS (maybe flaky? I don't see anything Mac OS specific)
Me neither. It definitively is one of the flaky tests. Should probably be investigated in a separate PR.
And type checking failing
Yep, thanks for the hint. That fixed it.
I'm a bit unsure about adding some of the additions
imitation.testing
. It seems best to reserve this for generally useful methods, that will either be used across multiple files in our test suite (and doesn't make sense to share as a fixture inconftest
) or by users of the library.
I agree. I was unsure where to place all the helpers too. In the end I decided to move them out of the way to improve readability of the tests. Placing them into conftest.py
was somewhat weird because of the special treatment it gets. I assumed it was reserved for fixtures? Where would you recommend to put it? Maybe just moving the failing data loader back to the test is reasonable ...
Maybe just moving the failing data loader back to the test is reasonable ...
I think this is the least bad option. I don't think it distracts from the tests that much, especially if you add a part of the file designated by a comment header for helper methods.
For the flaky test: it doesn't seem to have flaked before, so I do wonder if it's down to changes in the expert policy? Even if it's similarly strong, small differences making it harder for DAgger to learn could be enough to trigger this.
I'd suggest using pytest-flakefinder to see if you can replicate this on master vs in this PR. If no difference then we can go ahead and fix it in another PR. If it's new to this PR we should solve it before merging. It might be as simple as bumping the number of iterations we train DAgger for in test_trainer_makes_progress
from 6 to e.g. 10.
I moved the data loader back to test_bc.py
.
The test is only flaky on the MacOS runner. So this will take some more time to investigate since I don't have a MacOS system right now.