hurl Retry hurl files as whole

Problem to solve

When running in test mode, I'd like to be able to retry Hurl files as a whole, as opposed to retrying individual requests. The reason I'd like to add retries is to catch flaky tests (a test fails, but passes after a subsequent retry).

The per-request retry mechanism is not ideal for catching flaky tests:

Per-request retry is often used as a workaround for eventually consistent APIs that take some time to refresh. Hence they're expected temporary failures, not indicitative of a flaky test.
Some requests might not be idempotent, and cannot be retried without starting from the beginning of the test.

Proposal

Add a flag (e.g. --test-retry <n>) which retries the entire test file from the beginning
Should the test suite as a whole pass if all tests pass after retries?
- For detecting flaky tests, you might still want to exit with a non-zero exit code.
- Possibly configurable through another flag?
JUnit reports should log all test attempts, not just successful ones so it's easy to spot flaky tests from reports.

Additional context and resources

N/A

Tasks to complete

TODO

Jan 11 '25 11:01 simonzkl

Hi @simonzkl

This could be potentially difficult to implement: for instance, in --test mode, tests are run in parallel and the list of all tests to execute is constructed at the start of the run. Adding test files "on the fly" because they've failed is feasible but can be tricky to implement with the current code.

For your use case, it is possible to use --repeat option? If you have 3 files for instance:

$ hurl --test --repeat 10 a.hurl b.hurl c.hurl

will run (a, b, c) in loop 10 times.

I find it even better than a flag that retry only failed test because if a test is flaky, it can run OK the first time and not being identified as flaky (and the option --repeat already exists of course!)

Jan 11 '25 16:01 jcamiel

Hm, the test suite can actually take a while to run, so the preference would be to only retry those that failed. I'm not trying to explicitly look for flaky tests. Just mark tests as flaky as the failures occur in regular CI pipelines without significantly extending the runtime of those pipelines. Running tests in repeat would be more fitting on a schedule.

If it's tricky to retry tests on the spot, maybe it would be easier to aggregate them and re-run them at the end? I suppose I could also write a wrapper script to do this if the use case seems niche. Though I feel like this is pretty common practice in test runners.

Jan 11 '25 18:01 simonzkl

@jcamiel Happy to try and implement this myself, but I'd like to confirm with the team if this is something that's actually desired before investing time into it.

I can try and see if adding tests on demand is feasible without massive changes. If not, maybe we could just retry them at the end for the initial version of this feature.

Feb 11 '25 13:02 simonzkl

Hi @simonzkl you can work on it as a POC of course, but with really none guarantee that it will be merged. We take our time by discussing the uses cases and what we need to change, we don't want to add too many features that we won't be able to maintain.

Feb 11 '25 13:02 jcamiel

Understandable, if the implementation ends up adding too much complexity. Just wanted to get a sense if this is desired at the high level. I'll do a review of what's necessary and see.

Feb 11 '25 13:02 simonzkl

I did a quick grok through the code and the most obvious way that I could see this work is like this:

Tests are retried in theWorker itself immediately, preventing it from advancing to the next job. Progress is not updated until it passes or reaches a retry threshold.
The HurlResult in JobResult is adapted to a Vec<HurlResult> so it can report back all the job attempts.

This part seems fairly easy to implement in theory. I think the trickiest bits would be in reporting if we wanted to signify flaky tests somehow. Personally I don't think it's essential (as long as failures can be picked up from the final junit report), but it would be good UX.

Feb 11 '25 20:02 simonzkl

Here's a quick PoC for what I mean: https://github.com/Orange-OpenSource/hurl/commit/fe02b6727b35d3c91e1e34caf0ff6006f80a28e8. I can finish/clean this up if the high-level approach is considered ok.

Todo:

Add CLI flags
Update sequential runner as well
Carefully consider situations where we only care about the most recent Hurl result (vs all results)
Log retries / make retries more obvious
(Integration) tests

Nice to have(?):

Improve progress reports. Failed asserts are not reported until all retries have been attempted
Configurable exit code (should flaky tests be considered as a test suite failure?)

Feb 11 '25 22:02 simonzkl