cargo icon indicating copy to clipboard operation
cargo copied to clipboard

Feature Request: Run unit test multiple times

Open JarredAllen opened this issue 2 years ago • 14 comments

Problem

In my code, there are some flaky unit tests that pass most, but not all, of the time. To check if a change has fixed the flakiness of the test, it would be convenient to run the test many times and see if it succeeded or not, but afaict cargo doesn't have a way of doing this.

Proposed Solution

It'd be nice if there could be a command-line flag to run tests repeatedly. I'm imagining a command-line syntax like cargo test --repeat=100 testname, which will search for tests named "testname" (like cargo presently does) and then run the test(s) found 100 times, but I'm not too picky about the exact syntax.

Notes

No response

JarredAllen avatar Nov 09 '22 19:11 JarredAllen

This would be very helpful for my use cases as well!

jswang avatar Nov 09 '22 19:11 jswang

This would definitely be useful, I have a macro in my editor for repeating a test. Something built-in would be nicer. However, it is not clear exactly how this should work. For example, it may be better for this to be implemented in the harness itself, in which case https://github.com/rust-lang/rust/issues/65218 would be the issue for that.

ehuss avatar Nov 09 '22 20:11 ehuss

While this is in development, you can also use cargo nextest, which does this.

andrewgazelka avatar Nov 10 '22 01:11 andrewgazelka

@JarredAllen How do I get this working ?

ImmanuelSegol avatar Mar 04 '23 03:03 ImmanuelSegol

@JarredAllen How do I get this working ?

This is a feature request. It still needs to be implemented.

andrewgazelka avatar Mar 08 '23 19:03 andrewgazelka

It seems that this work is relatively important. I am currently learning testing related codes, can I try to complete this issue? Is there anything I need to pay special attention to? Or other plans?

heisen-li avatar Dec 22 '23 12:12 heisen-li

This is marked needs-design, meaning someone needs to put forward a more detailed proposal for what to do before we move forward with implementation.

In particular, we need to figure out which combination of layers this belongs in,

  • if libtest, t-libs-api is likely to defer that to custom test harnesses
  • if cargo-test and we just repeat what was said
  • or some other design that mixes these

epage avatar Dec 22 '23 13:12 epage

Sorry for sharing my negligible view., it seems best to make modifications in libtest, fine control is not possible with cargo.

heisen-li avatar Dec 27 '23 12:12 heisen-li

Our plans for cargo would allow fine control in the future.

We are looking at making cargo and libtest communicate through a greater knowledge of the CLI, including being able to enable json output, putting the responsibility for rendering on cargo.

This would allow cargo test to track individual tests and decide what to do with them, like re-running a failed test.

What would be good is to explore prior art to see if it has any affect on the design. For example, would people want to be able to annotate individual tests about retrying? If so, we'd either want retrying within the test harness or that would be good feedback for the test runner/harness communication.

epage avatar Dec 27 '23 16:12 epage

For example, would people want to be able to annotate individual tests about retrying?

The way you say "retry" and "annotate" makes me think there are two separate usecases in poeple's minds here:

  1. I have discovered that a test is flaky. I am trying to debug/fix it, I need to run it 1000 times to reproduce the failure/evaluate my fix.
  2. Our tests are flaky, we will use retries to make our CI dashboard green.

For usecase 1, I have found that a single global "repeat all selected tests N times" commandlflag works fine. I have seen various names for this, I think --runs-per-test is my favourite because it makes the "repeat all tests" thing obvious. https://github.com/rust-lang/rust/issues/65218 mentions a couple of examples of prior art for this, I have used both of those examples and they seem to work well. I think this is also what the OP suggested.

For usecase 2, there is the philosophical question of how opinionated Cargo wants to be about software engineering. I have worked on a project where the test tools have a global "retry any test that fails" flag. This probably sounds like a bad idea, and my experience has indeed suggested that, exactly as you would suspect, using it means your tests get flakier over time instead of less flaky. My impression of the Rust culture is that people would instinctively agree that this is a harmful feature for test tools to have.

A less toxic design IMO is to be able to annotate invididual tests as "known flaky", and then leave it up to the person/tool running the tests to decide if that means "don't bother running it at all" or "run it up to N times until is passes". Google's monorepo has a tag that works like that and it seems OK to me. This means you can maintain your nice green CI dashboard, but you have a ratchet where you at least notice if a formerly-stable test becomes flaky. It also means if you run a test to change your WIP PR, and the test fails, and it doesn't have the "flaky" tag, you know you probably broke the test. Whereas with the global retry flag you have to go and look in your CI history and do some informal Bayesian analysis.

Anyway, I think that this FR was probably motivated by usecase 1, and that usecase seems much easier to solve, so it might make sense to focus on that.

bjackman avatar Apr 15 '24 08:04 bjackman