openfe icon indicating copy to clipboard operation
openfe copied to clipboard

Evaluate how we want to run flaky tests

Open mikemhenry opened this issue 8 months ago • 4 comments
trafficstars

See https://github.com/OpenFreeEnergy/openfe/pull/1152

We need a more comprehensive plan on how we want to handle flaky tests. There are two different pytest plugins that do it. Another consideration is how they seem to not play well with xdist (sometimes I've noticed if the test crashes hard enough, it can't retry it)

codecov also now offers tracking of flaky tests, so maybe we remove all the retry marks and just kinda see what happens. I don't look at the logs close enough to see how often things need to be retried, and if we set codcov to track this (which is easy to do) then we can get some stats on what tests are actually flaky.

mikemhenry avatar Feb 24 '25 22:02 mikemhenry

I'm not sure I understand why allowing the tests to fail is a good option here.

Couple of thoughts:

  1. If we remove options to retry certain tests when they fail, you then end up in a situation where the test might fail (say in a PR) and you have to go out of your way to know "o yeah ok, that's a flaky test so I guess it's fine". Given tests labelled as flaky usually fail often enough, this seems like a more frequent bother than "every so often it'll crash hard and will fail".
  2. We ask our users to run tests as a means of validating their installs. Having a flaky test fail is bound to create confusion. Somehow we'll have to communicate "o ok yeah no that test can sometimes fail" in a clear manner - which can be a lot more painful than "can you re-run this with fewer threads?".

IAlibay avatar Feb 26 '25 12:02 IAlibay

Sorry I wasn't clear, I am not advocating that we allow failures, I just want to harmonize on the restart method. Like sometimes tests fail because due credit gets an http error when resolving a DOI, so if that happens I want the test to re-run. We can put a limit like, "retry test up to 3 times" and it will be fail if it doesn't work after 3 attempts.

mikemhenry avatar Mar 11 '25 18:03 mikemhenry

I just want to harmonize on the restart method.

Ah ok, that's sounds great to me, thanks for explaining this @mikemhenry !

IAlibay avatar Mar 11 '25 19:03 IAlibay

For example, this test: ERROR openfe/tests/protocols/test_openmmutils.py::test_forward_backwards_failure - requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='figshare.com', port=443): Read timed out. (read timeout=30)

Would benefit from a retry since it is just a HTTP issue

mikemhenry avatar Mar 26 '25 14:03 mikemhenry