khmer icon indicating copy to clipboard operation
khmer copied to clipboard

[WIP] Add infrastructure for fixed random seeds in tests

Open betatim opened this issue 8 years ago • 5 comments

Proposal for fixing random seeds in tests to make them reproducible.

The basic idea is that if you need randomness, you call check_random_state(seed) and that you expose the seed argument to your callers. If they don't care they can not pass anything in, in which case seed=None and we get random numbers that aren't reproducible. If they do care they can either pass in a number or an existing random number generator.

@standage this is what I had in mind. Shamelessly cribbed from http://scikit-learn.org/stable/modules/generated/sklearn.utils.check_random_state.html

We also use this approach in scikit-optimize.

  • [ ] Is it mergeable?
  • [ ] make test Did it pass the tests?
  • [ ] make clean diff-cover If it introduces new functionality in scripts/ is it tested?
  • [ ] make format diff_pylint_report doc pydocstyle Is it well formatted?
  • [ ] Did it change the command-line interface? Only backwards-compatible additions are allowed without a major version increment. Changing file formats also requires a major version number increment.
  • [ ] For substantial changes or changes to the command-line interface, is it documented in CHANGELOG.md? See keepachangelog for more details.
  • [ ] Was a spellchecker run on the source code and documentation after changes were made?
  • [ ] Do the changes respect streaming IO? (Are they tested for streaming IO?)

betatim avatar Mar 01 '17 17:03 betatim

Got it.

So at some point we need to print out the random seed being used so that when tests fail we can reproduce their behavior, right?

standage avatar Mar 02 '17 07:03 standage

Not really. If we do everything right in writing the tests they always use the same seed on all the platforms. The only time things change is when you reorder/change number of calls to the RNG. Though usually you don't do that for debugging a failure. Alternatively you can go in and fix the seeds for the intermediate steps.

(I've seen it done that you provide different seeds as parameters to a test_blah(seed) method if you wanted to run the same test for different seeds. Though I would argue that the value of the seed shouldn't change the outcome of your test)

betatim avatar Mar 02 '17 13:03 betatim

If you like it, I will spread the approach further around the tests.

betatim avatar Mar 02 '17 15:03 betatim

If we do everything right in writing the tests they always use the same seed on all the platforms.

Ah, I see. I misunderstood.

In contrast, the GenomeTools library (to which I was previously a frequent contributor) has the CI generate and report a new random seed for each build. This way it's are testing the overall robustness of the randomized procedures, not just for a single seed. It was pretty rare, but every once in a while one of the tests would fail on an edge case, and the seed would enable us to track down exactly what happened.

Of course, this often requires a different approach to evaluating the output, so I don't know how relevant it would be here, especially in the short term.

standage avatar Mar 02 '17 17:03 standage

Codecov Report

Merging #1655 into master will increase coverage by 0.01%. The diff coverage is 88.88%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1655      +/-   ##
==========================================
+ Coverage   69.81%   69.82%   +0.01%     
==========================================
  Files          66       66              
  Lines        8971     8978       +7     
  Branches     3060     3063       +3     
==========================================
+ Hits         6263     6269       +6     
- Misses       1025     1027       +2     
+ Partials     1683     1682       -1
Impacted Files Coverage Δ
khmer/utils.py 98.47% <88.88%> (-0.71%) :arrow_down:
lib/hashtable.cc 55.18% <0%> (-0.22%) :arrow_down:
lib/hashgraph.cc 46.49% <0%> (-0.13%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update fbbb3c1...b188902. Read the comment docs.

codecov-io avatar Mar 03 '17 14:03 codecov-io