khmer
khmer copied to clipboard
[WIP] Add infrastructure for fixed random seeds in tests
Proposal for fixing random seeds in tests to make them reproducible.
The basic idea is that if you need randomness, you call check_random_state(seed) and that you expose the seed argument to your callers. If they don't care they can not pass anything in, in which case seed=None and we get random numbers that aren't reproducible. If they do care they can either pass in a number or an existing random number generator.
@standage this is what I had in mind. Shamelessly cribbed from http://scikit-learn.org/stable/modules/generated/sklearn.utils.check_random_state.html
We also use this approach in scikit-optimize.
- [ ] Is it mergeable?
- [ ]
make testDid it pass the tests? - [ ]
make clean diff-coverIf it introduces new functionality inscripts/is it tested? - [ ]
make format diff_pylint_report doc pydocstyleIs it well formatted? - [ ] Did it change the command-line interface? Only backwards-compatible additions are allowed without a major version increment. Changing file formats also requires a major version number increment.
- [ ] For substantial changes or changes to the command-line interface, is it
documented in
CHANGELOG.md? See keepachangelog for more details. - [ ] Was a spellchecker run on the source code and documentation after changes were made?
- [ ] Do the changes respect streaming IO? (Are they tested for streaming IO?)
Got it.
So at some point we need to print out the random seed being used so that when tests fail we can reproduce their behavior, right?
Not really. If we do everything right in writing the tests they always use the same seed on all the platforms. The only time things change is when you reorder/change number of calls to the RNG. Though usually you don't do that for debugging a failure. Alternatively you can go in and fix the seeds for the intermediate steps.
(I've seen it done that you provide different seeds as parameters to a test_blah(seed) method if you wanted to run the same test for different seeds. Though I would argue that the value of the seed shouldn't change the outcome of your test)
If you like it, I will spread the approach further around the tests.
If we do everything right in writing the tests they always use the same seed on all the platforms.
Ah, I see. I misunderstood.
In contrast, the GenomeTools library (to which I was previously a frequent contributor) has the CI generate and report a new random seed for each build. This way it's are testing the overall robustness of the randomized procedures, not just for a single seed. It was pretty rare, but every once in a while one of the tests would fail on an edge case, and the seed would enable us to track down exactly what happened.
Of course, this often requires a different approach to evaluating the output, so I don't know how relevant it would be here, especially in the short term.
Codecov Report
Merging #1655 into master will increase coverage by
0.01%. The diff coverage is88.88%.
@@ Coverage Diff @@
## master #1655 +/- ##
==========================================
+ Coverage 69.81% 69.82% +0.01%
==========================================
Files 66 66
Lines 8971 8978 +7
Branches 3060 3063 +3
==========================================
+ Hits 6263 6269 +6
- Misses 1025 1027 +2
+ Partials 1683 1682 -1
| Impacted Files | Coverage Δ | |
|---|---|---|
| khmer/utils.py | 98.47% <88.88%> (-0.71%) |
:arrow_down: |
| lib/hashtable.cc | 55.18% <0%> (-0.22%) |
:arrow_down: |
| lib/hashgraph.cc | 46.49% <0%> (-0.13%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update fbbb3c1...b188902. Read the comment docs.