0.5: random test failures due to unreliable test timings
I am trying to update astroplan to the latest version 0.5 on Debian. When running the tests on the different platforms, I get random failures like
> raise Flaky(message)
E hypothesis.errors.Flaky: Hypothesis test_boundaries(nside_pow=0, frac=0.4911781148020645, step=1, nest=True) produces unreliable results: Falsified on the first call but did not on a subsequent one
/usr/lib/python3/dist-packages/hypothesis/core.py:751: Flaky
---------------------------------- Hypothesis ----------------------------------
Falsifying example: test_boundaries(nside_pow=0, frac=0.4911781148020645, step=1, nest=True)
Unreliable test timings! On an initial run, this test took 376.64ms, which exceeded the deadline of 200.00ms, but on a subsequent run it took 5.95 ms, which did not. If you expect this sort of variability in your test timings, consider turning deadlines off for this test by setting deadline=None.
on one or the other place. This happened so far for MIPS 32/64 bit and ARM 64 bit.
I would guess that the unreliable timing comes from the hardware and the load on our test machines -- especially the MIPS machines are rather slow. And I don't really see why this shall cause a test failure.
As suggested by the error message above, I could disable this by setting deadline=None; however as far as I understand the "hypothesis" package, this has to be done individually for each test, which would make a Debian specific patch rather unmaintainable. Is there a way to switch this off globally, and would you consider doing this in the upstream package? Or do I misunderstand something here?
Cc: @lpsinger as the Debian package maintainer
@olebole Does this issue persist with the latest version of astroplan?