maven-surefire [SUREFIRE-1638] Support running tests until failure or N test runs

This is an initial proposal to add surefire/failsafe property called untilFailureLoopCount.

I often find myself in the situation where there's a random test failure happening in CI, and have to run the one or several tests continuously until it fails, maybe running them with TRACE logging.

Normally, I'd wrap the mvn test... call with a script that calls it in a loop, but this can be quite "slow" since each iteration is a new JVM launch...etc.

The idea behind untilFailureLoopCount is that you've give it a positive number, say 1000, and the test(s) would run continuously until either a failure happens or the test is executed that many times.

I've created this randomly failing test and I've run it with mvn -Dsurefire.untilFailureLoopCount=10 test and the output would be something like:

[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running com.acme.maven.surefire.untilfailure.UntilFailureLoopCountTest
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.034 s - in com.acme.maven.surefire.untilfailure.UntilFailureLoopCountTest
[INFO] Running com.acme.maven.surefire.untilfailure.UntilFailureLoopCountTest
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 s - in com.acme.maven.surefire.untilfailure.UntilFailureLoopCountTest
[INFO] Running com.acme.maven.surefire.untilfailure.UntilFailureLoopCountTest
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 s - in com.acme.maven.surefire.untilfailure.UntilFailureLoopCountTest
[INFO] Running com.acme.maven.surefire.untilfailure.UntilFailureLoopCountTest
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 s - in com.acme.maven.surefire.untilfailure.UntilFailureLoopCountTest
[INFO] Running com.acme.maven.surefire.untilfailure.UntilFailureLoopCountTest
[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.001 s <<< FAILURE! - in com.acme.maven.surefire.untilfailure.UntilFailureLoopCountTest
[ERROR] com.acme.maven.surefire.untilfailure.UntilFailureLoopCountTest.testEventualFailure  Time elapsed: 0.001 s  <<< FAILURE!
java.lang.AssertionError: Just messing with your testing
	at com.acme.maven.surefire.untilfailure.UntilFailureLoopCountTest.testEventualFailure(UntilFailureLoopCountTest.java:16)

If maintainers are interested in this, I can complete the PR with tests, documentation and support for other providers.

Feb 13 '19 17:02 galderz

Thank you. It is very interesting.

Can you please create an issue on JIRA ? issues.apache.org

Feb 13 '19 18:02 eolivelli

I've created SUREFIRE-1638. So, shall I go ahead with the rest of missing bits?

Feb 14 '19 11:02 galderz

@galderz sure, I am going to sponsor it

Feb 14 '19 11:02 eolivelli

@galderz

I think you are talking about existing feature skipAfterFailureCount. You should use it together with reRun feature which exists too.

Feb 14 '19 13:02 Tibor17

@Tibor17 skipAfterFailureCount and rerunFailingTestsCount don't achieve the same thing:

Since of 2.19.1 you can use parameters skipAfterFailureCount and rerunFailingTestsCount together. This is enabled by providers surefire-junit4 and surefire-junit47. You can run again failed tests and skip the rest of the test-set if errors or failures reached skipAfterFailureCount. Notice that failed tests within re-run phase are not included in skipAfterFailureCount.

rerunFailingTestsCount does not makes sense for my use case, I don't want to re-run failing tests. I'm waiting just for one single failure. Of course, I assume skipAfterFailureCount is 1 because the moment I get a failure I want things to stop. skipAfterFailureCount > 1 does not make sense in my problem. The moment I get a failure I have the TRACE log I need and I start debugging.

Feb 15 '19 09:02 galderz

To add more details, one might wonder why we don't set rerunFailingTestsCount > 0 on running in CI, and that's because CI doesn't run with TRACE enabled. So, if having rerunFailingTestsCount> 0 there does not get us anything. When we find a randomly failing test, then we try to isolate it, run it with TRACE, and run it until it fails.

Feb 15 '19 14:02 galderz

@galderz I have mentioned skipAfterFailureCount. Have you ever used it? Set skipAfterFailureCount=10 and the 11th error stops running more test classes.

Feb 15 '19 15:02 Tibor17

@Tibor17 But skipAfterFailureCount is not what I want. I want to stop after the first failure. skipAfterFailureCount makes no sense in my use case. I want to keep the test running while no failures are happening.

Feb 15 '19 16:02 galderz

@eolivelli @Tibor17 I don't know how else to explain this. I've even created a standalone project where I test this out, so if there's any other way in which this can achieved I'm all ears, but show me how to configure that project to do keep running a test until it fails, or run at most run N times.

Feb 15 '19 16:02 galderz

@galderz ok, I think I understand that you want to run a class in a loop until it fails. My question is about report XML expectations. We report in the XML after each run of the test class. This means that your solution overrides each previously generated XML and you do not know which index caused the failure after the loop.

Feb 15 '19 20:02 Tibor17

@galderz @olamy The report was not designed for this loop and it will detect re-run feature. The report class is called stateless but it is not stateless because the states (HashMap) are entered via constructor which is necessary in re-run. I think the last run wins in this feature. You should avoid both features running together by throwing exception in AbstractSurefireMojo. In SUREFIR-1222 we want to mark report statistics data with normal run or rerun and then this HashMap and the algorithm should change and should not depend on order of runs. Caching can be done before the reporter and the reporter can be called once which means the class would be stateless again. In SUREFIRE-1638 we should do the same and mark the report statistics data. Let's do it after SUREFIRE-1222 the runs of test in this feature which means the re-run feature is avoided and the algorith will detect this feature.

Feb 16 '19 03:02 Tibor17

@Tibor17 So, how do you suggest it's done? I'm not sure I fully understood your suggestions.

From a reporting perspective, overriding previous runs does not seem like an issue to me. What you'd want is that whichever the last run is (failure, or reaching N times) is the one for which the reports are produced?

Mar 04 '19 15:03 galderz

@galderz I understand the point of this feature. As I said before your feature is not designed for StatelessXmlReporter and the reporter thinks it is rerun feature and not your feature. First of all we have to change our shtg implementation. Therefore we have those milestones M4 - M6. Our feature rerun will be working more effective after this and your feature will be guaranteed to work as expected but we cannot accept feature request immediately if we are aware that something went wrong. If we accepted every feature like this we could throw this code away very soon. So we need to have time to solve our internal problems and then you will be notified that the time has come for your feature. I can promis it won't take too long but several previous features had to also align with our milestones and the result was better than in the beginning of the feature request.

Mar 04 '19 17:03 Tibor17

@Tibor17 That's fine, keep me posted when this can go in and I can adjust accordingly.

Mar 14 '19 11:03 galderz

maven-surefire maven-surefire copied to clipboard

[SUREFIRE-1638] Support running tests until failure or N test runs

maven-surefire
maven-surefire copied to clipboard