imitation icon indicating copy to clipboard operation
imitation copied to clipboard

Introduce hypothesis testing to BC tests

Open ernestum opened this issue 2 years ago • 7 comments

Description

This PR introduces hypothesis tests for BC and does some general cleanup such as introducing a GIVEN, WHEN, THEN structure where appropriate and renaming tests to make it clear what they test.

  • [ ] ~~Make sure the hypothesis cache is preserved~~ (no decent way to do this, not even the hypothesis ppl do it in their CI)
  • [x] Move test utilities to a different module
  • [x] Investigate if _DataLoaderThatFailsOnNthIter really behaves as intended (it does)
  • [x] Stop to parameterize the vecenv size where it makes no sense. (decided it makes sense everywhere)

Testing

Na, I did not test the tests ...

ernestum avatar Sep 19 '22 15:09 ernestum

Codecov Report

Merging #569 (5650240) into master (288c25a) will increase coverage by 0.00%. The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #569   +/-   ##
=======================================
  Coverage   97.47%   97.48%           
=======================================
  Files          83       85    +2     
  Lines        8082     8105   +23     
=======================================
+ Hits         7878     7901   +23     
  Misses        204      204           
Impacted Files Coverage Δ
src/imitation/testing/expert_trajectories.py 100.00% <100.00%> (ø)
tests/algorithms/conftest.py 100.00% <100.00%> (ø)
tests/algorithms/test_bc.py 100.00% <100.00%> (ø)
tests/algorithms/test_dagger.py 100.00% <100.00%> (ø)
tests/conftest.py 100.00% <100.00%> (ø)

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov[bot] avatar Sep 20 '22 15:09 codecov[bot]

Also one test is failing on Mac OS (maybe flaky? I don't see anything Mac OS specific). And type checking failing --- I think pytype may be confused by the presence of two conftest.py files, adding __init__.py to tests/algorithms may help with this.

AdamGleave avatar Oct 06 '22 03:10 AdamGleave

Also one test is failing on Mac OS (maybe flaky? I don't see anything Mac OS specific)

Me neither. It definitively is one of the flaky tests. Should probably be investigated in a separate PR.

ernestum avatar Oct 06 '22 11:10 ernestum

And type checking failing

Yep, thanks for the hint. That fixed it.

ernestum avatar Oct 06 '22 11:10 ernestum

I'm a bit unsure about adding some of the additions imitation.testing. It seems best to reserve this for generally useful methods, that will either be used across multiple files in our test suite (and doesn't make sense to share as a fixture in conftest) or by users of the library.

I agree. I was unsure where to place all the helpers too. In the end I decided to move them out of the way to improve readability of the tests. Placing them into conftest.py was somewhat weird because of the special treatment it gets. I assumed it was reserved for fixtures? Where would you recommend to put it? Maybe just moving the failing data loader back to the test is reasonable ...

ernestum avatar Oct 06 '22 11:10 ernestum

Maybe just moving the failing data loader back to the test is reasonable ...

I think this is the least bad option. I don't think it distracts from the tests that much, especially if you add a part of the file designated by a comment header for helper methods.

AdamGleave avatar Oct 06 '22 18:10 AdamGleave

For the flaky test: it doesn't seem to have flaked before, so I do wonder if it's down to changes in the expert policy? Even if it's similarly strong, small differences making it harder for DAgger to learn could be enough to trigger this.

I'd suggest using pytest-flakefinder to see if you can replicate this on master vs in this PR. If no difference then we can go ahead and fix it in another PR. If it's new to this PR we should solve it before merging. It might be as simple as bumping the number of iterations we train DAgger for in test_trainer_makes_progress from 6 to e.g. 10.

AdamGleave avatar Oct 06 '22 18:10 AdamGleave

I moved the data loader back to test_bc.py.

The test is only flaky on the MacOS runner. So this will take some more time to investigate since I don't have a MacOS system right now.

ernestum avatar Oct 13 '22 07:10 ernestum