hurl icon indicating copy to clipboard operation
hurl copied to clipboard

Retry hurl files as whole

Open simonzkl opened this issue 1 year ago • 7 comments

Problem to solve

When running in test mode, I'd like to be able to retry Hurl files as a whole, as opposed to retrying individual requests. The reason I'd like to add retries is to catch flaky tests (a test fails, but passes after a subsequent retry).

The per-request retry mechanism is not ideal for catching flaky tests:

  1. Per-request retry is often used as a workaround for eventually consistent APIs that take some time to refresh. Hence they're expected temporary failures, not indicitative of a flaky test.
  2. Some requests might not be idempotent, and cannot be retried without starting from the beginning of the test.

Proposal

  • Add a flag (e.g. --test-retry <n>) which retries the entire test file from the beginning
  • Should the test suite as a whole pass if all tests pass after retries?
    • For detecting flaky tests, you might still want to exit with a non-zero exit code.
    • Possibly configurable through another flag?
  • JUnit reports should log all test attempts, not just successful ones so it's easy to spot flaky tests from reports.

Additional context and resources

N/A

Tasks to complete

TODO

simonzkl avatar Jan 11 '25 11:01 simonzkl

Hi @simonzkl

This could be potentially difficult to implement: for instance, in --test mode, tests are run in parallel and the list of all tests to execute is constructed at the start of the run. Adding test files "on the fly" because they've failed is feasible but can be tricky to implement with the current code.

For your use case, it is possible to use --repeat option? If you have 3 files for instance:

$ hurl --test --repeat 10 a.hurl b.hurl c.hurl

will run (a, b, c) in loop 10 times.

I find it even better than a flag that retry only failed test because if a test is flaky, it can run OK the first time and not being identified as flaky (and the option --repeat already exists of course!)

jcamiel avatar Jan 11 '25 16:01 jcamiel

Hm, the test suite can actually take a while to run, so the preference would be to only retry those that failed. I'm not trying to explicitly look for flaky tests. Just mark tests as flaky as the failures occur in regular CI pipelines without significantly extending the runtime of those pipelines. Running tests in repeat would be more fitting on a schedule.

If it's tricky to retry tests on the spot, maybe it would be easier to aggregate them and re-run them at the end? I suppose I could also write a wrapper script to do this if the use case seems niche. Though I feel like this is pretty common practice in test runners.

simonzkl avatar Jan 11 '25 18:01 simonzkl

@jcamiel Happy to try and implement this myself, but I'd like to confirm with the team if this is something that's actually desired before investing time into it.

I can try and see if adding tests on demand is feasible without massive changes. If not, maybe we could just retry them at the end for the initial version of this feature.

simonzkl avatar Feb 11 '25 13:02 simonzkl

Hi @simonzkl you can work on it as a POC of course, but with really none guarantee that it will be merged. We take our time by discussing the uses cases and what we need to change, we don't want to add too many features that we won't be able to maintain.

jcamiel avatar Feb 11 '25 13:02 jcamiel

Understandable, if the implementation ends up adding too much complexity. Just wanted to get a sense if this is desired at the high level. I'll do a review of what's necessary and see.

simonzkl avatar Feb 11 '25 13:02 simonzkl

I did a quick grok through the code and the most obvious way that I could see this work is like this:

  • Tests are retried in theWorker itself immediately, preventing it from advancing to the next job. Progress is not updated until it passes or reaches a retry threshold.
  • The HurlResult in JobResult is adapted to a Vec<HurlResult> so it can report back all the job attempts.

This part seems fairly easy to implement in theory. I think the trickiest bits would be in reporting if we wanted to signify flaky tests somehow. Personally I don't think it's essential (as long as failures can be picked up from the final junit report), but it would be good UX.

simonzkl avatar Feb 11 '25 20:02 simonzkl

Here's a quick PoC for what I mean: https://github.com/Orange-OpenSource/hurl/commit/fe02b6727b35d3c91e1e34caf0ff6006f80a28e8. I can finish/clean this up if the high-level approach is considered ok.

Todo:

  • Add CLI flags
  • Update sequential runner as well
  • Carefully consider situations where we only care about the most recent Hurl result (vs all results)
  • Log retries / make retries more obvious
  • (Integration) tests

Nice to have(?):

  • Improve progress reports. Failed asserts are not reported until all retries have been attempted
  • Configurable exit code (should flaky tests be considered as a test suite failure?)

simonzkl avatar Feb 11 '25 22:02 simonzkl