Catch2 icon indicating copy to clipboard operation
Catch2 copied to clipboard

Add new SKIP macro for skipping tests at runtime

Open psalz opened this issue 3 years ago ā€¢ 12 comments

This adds a new SKIP macro for dynamically skipping tests at runtime. The "skipped" status of a test case is treated as a first-class citizen, like "succeeded" or "failed", and is reported with a new color on the console.

This is useful when the conditions for skipping a test are only known at runtime. For example:

TEST_CASE( "some feature does something on GPUs" ) {
    if ( get_gpu_count() == 0 ) {
        SKIP( "no GPUs are available" );
    }
    CHECK( do_something_on_gpu() );
}

The output will then look like this: image

Like other non-assertion type macros (such as INFO and WARN), skipping is still implemented as an "assertion" internally. I've added a new result type ExplicitSkip (which is currently not treated as a failure) as well as a new skipped counter in Catch::Totals. While this initial implementation was surprisingly simple, I've marked the PR as a draft as there are still several outstanding design questions.

For example, the result type is called an explicit skip because there already exists the notion of a "skipped" test in the current codebase. In fact, there are at least 3 conditions under which tests could be considered as skipped:

  • Tests that are hidden ([.])
  • Tests that are not matched by the provided filter
  • Remaining tests that are not executed after the --abortx limit has been reached

The last one is currently explicitly referred to as "skipped" within the IStreamingReporter::skipTest hook. As far as I can tell, only the Automake reporter is actually handling this one though. I'm wondering whether there is an opportunity here for unifying (some) of these concepts in both how they are handled internally as well as the way they are being reported.

Open TODOs:

  • [ ] Determine whether/how the notion of "skipping" should be unified within the codebase and in reporting (see above)
  • [x] How should skipping some sections / generators affect the overall result of a test case?
  • [x] Does having a skipped test mean a non-zero exit code?
  • [ ] Add hook for reporters to handle explicitly skipped tests (figure out what to do with existing IStreamingReporter::skipTest)
  • [ ] Properly handle skipped tests in the various built-in reporters
  • [ ] Consider adding a CLI option for whether skipped tests should be reported or not (similar to -s)
  • [x] Should "skipped assertions" even be reported? The fact that these even exist is a result of how test results are communicated throughout the system (i.e., a "skipped assertion" only leads to a skipped test in Catch::Totals::delta), but the terminology feels a bit weird.
  • [ ] Add documentation

Resolves #395.

psalz avatar Jan 31 '22 16:01 psalz

Codecov Report

Merging #2360 (6309b3c) into devel (52066db) will increase coverage by 0.01%. The diff coverage is 90.22%.

@@            Coverage Diff             @@
##            devel    #2360      +/-   ##
==========================================
+ Coverage   91.13%   91.15%   +0.01%     
==========================================
  Files         187      187              
  Lines        7634     7691      +57     
==========================================
+ Hits         6957     7010      +53     
- Misses        677      681       +4     

codecov[bot] avatar Jan 31 '22 17:01 codecov[bot]

This is going to take awhile to review, I don't think I will really get to this before next weekend.

horenmar avatar Feb 06 '22 22:02 horenmar

I think that skipped tests should not count as failures, nor should they count as successes. Skipping a test case from within the TEST_CASE should behave similarly to not running the test at all.

This has some consequences, such as if a test run only invokes skipped tests, then it should fail (because Catch2 fails test run that runs no test cases), but as long as there is at least one test case that finished, the return code should be 0.

However, it might be useful to add a warning (-w) option that would cause the test run to fail if any tests were skipped, as essentially an "audit" option.

From this also follows what should happen when a section/generator input is skipped: it should look like that section/generator input does not exist. At least as much as possible, if the test case looks like this:

TEST_CASE() {
    CHECK(1 == 2);
    SKIP();
    REQUIRE(1 == 2);
}

then that test case should be counted as failed.

Further examples:

TEST_CASE() {
    SECTION("A") { std::cout << "A\n"; }
    SECTION("B") { SKIP(); }
    SECTION("C") { std::cout << "C\n"; }
}

Should print out "A\n" and "C\n", and similarly

TEST_CASE() {
    SECTION("A") { std::cout << "A"; }
    SECTION("B") {
        SECTION("B1") { std::cout << "B1"; }
        SECTION("B2") { SKIP(); }
    }
    std::cout << "\n";
}

should print out "A\n" and "B1\n", and nothing more. (Generally exits from the last section in test case are tricky, due to how sections are discovered and tracked.)

As to the other parts, "skipped assertions" should definitely not be reported as assertion. However, their test cases should be reported as skipped. I don't believe we need a flag to show them, as they should be shown by default in most reporters. I am willing to change my mind on this though.

Also I have no idea what to do about the existing skipTest uses -- I am not sure anything actually uses it for anything useful (I am not entirely sold on --abort being useful either).

horenmar avatar Feb 17 '22 15:02 horenmar

If the section tracking proves trouble some, I am willing to say that test skips can only happen at the top of a TEST_CASE, but I would rather not, or at least allow

TEST_CASE() {
    auto i = GENERATE( ... );
    if (i % 2 == 1) { SKIP(); }
}

horenmar avatar Feb 17 '22 15:02 horenmar

Thank you, I will need some time to investigate how to best address all your points.

Also I have no idea what to do about the existing skipTest uses -- I am not sure anything actually uses it for anything useful (I am not entirely sold on --abort being useful either).

Would you in principle be open to repurposing that hook for the SKIP macro in a breaking change?

psalz avatar Feb 21 '22 17:02 psalz

Would you in principle be open to repurposing that hook for the SKIP macro in a breaking change?

Yes, but it would also likely need to be renamed and further changed, as the SKIP granularity is at section level, not test case level.

horenmar avatar Feb 21 '22 17:02 horenmar

Sorry for the delay. After looking back into this now, it seems like not too many things need changing after all.

I've rebased onto current devel and removed the printing of "skipped assertions" from the console reporter.

Skipping Behavior

I think the initial version of this PR already mostly did what you expected in your examples. I've added a few more test cases to better cover these scenarios.

and similarly

TEST_CASE() {
  SECTION("A") { std::cout << "A"; }
  SECTION("B") {
      SECTION("B1") { std::cout << "B1"; }
      SECTION("B2") { SKIP(); }
 }
  std::cout << "\n";
}

should print out "A\n" and "B1\n", and nothing more. (Generally exits from the last section in test case are tricky, due to how sections are discovered and tracked.)

Not sure I'm following. If there were a FAIL instead of the SKIP in the last section, it would print A\n, B1\n, \n, right? Because as you say, exiting the last section is tricky. Why would the behavior for SKIP have to be different, then? Without having looked into it in detail, it seems like treating this differently would require a lot of work for arguably little gain, IMO.

If the section tracking proves trouble some,

Again I'm not sure I understand why I would need to do section tracking manually; by treating SKIP similarly to REQUIRE or FAIL (i.e., ResultDisposition::Normal), I don't need any custom section tracking and basically get the behavior you show in your examples for free.

Exit Code

I've adjusted things to return a non-zero exit code (4) when tests were run, but all of them ended up being skipped. Do we want something analogous to --allow-running-no-tests here? E.g., --allow-skipping-all-tests.

I've also added a new test case for this behavior, however I'm not sure whether "ExtraTests" is the right location for it. Also I have no idea what the whole X#-<name>.cpp naming scheme is about. I basically just did whatever the EmptySpecWithNoTestsFails test was doing.

Hooks

Also I have no idea what to do about the existing skipTest uses -- I am not sure anything actually uses it for anything useful (I am not entirely sold on --abort being useful either).

Turns out I don't actually have a use case for a hook when SKIPing a test, so I'd rather wait for one to come up before I unnecessarily pollute the API. At the same time we can't keep the existing skipTest hook, as it would be extremely confusing. So my current thinking would be to just remove it. Alternatively we could rename it to abortTest or something.

psalz avatar Apr 13 '22 12:04 psalz

Iā€˜d be happy to see this feature in Catch2 at some point in time. I have a similar use case, namely skipping some test cases at runtime if certain ressources (in my case some attached sensors) are not availabe.

michael-hartmann avatar Sep 29 '22 19:09 michael-hartmann

I am finally back around for long enough to go through the larger PRs.

I rebased the PR and will pick it up during this week. Sorry for the long wait šŸ˜ƒ

horenmar avatar Oct 01 '22 19:10 horenmar

Thanks for looking into this again, let me know if you need anything!

psalz avatar Oct 03 '22 07:10 psalz

@psalz I read through it again today, I think the changes are fine, but the other reporters need to have a better support for skipping.

I tried a few, and the output is not very useful, e.g.

$ ./tests/SelfTest  "tests can be skipped dynamically at runtime" -r tap
# rng-seed: 3838586559
# tests can be skipped dynamically at runtime
** internal error ** 1 -
1..1

internal error as skip message is no good.

$ ./tests/SelfTest  "tests can be skipped dynamically at runtime" -r compact
Filters: tests can be skipped dynamically at runtime
RNG seed: 2657692748
Passed all 0 test cases with 0 assertions.

The 0 passed, all green output is confusing, especially when combined with the return code being 4 (because all selected tests were skipped).

$ ./tests/SelfTest  "tests can be skipped dynamically at runtime" -r xml
<?xml version="1.0" encoding="UTF-8"?>
<Catch2TestRun name="SelfTest" rng-seed="371487568" catch2-version="3.1.0" filters="tests can be skipped dynamically at runtime">
  <TestCase name="tests can be skipped dynamically at runtime" tags="[skipping]" filename="/mnt/c/ubuntu/Catch2-psalz/tests/SelfTest/UsageTests/Skip.tests.cpp" line="13">
    <OverallResult success="true"/>
  </TestCase>
  <OverallResults successes="0" failures="0" expectedFailures="0"/>
  <OverallResultsCases successes="0" failures="0" expectedFailures="0"/>
</Catch2TestRun>

XML reporter should provide all available information, because it implements Catch2-specific output format to be used for machine parsing of the output.

$ ./tests/SelfTest  "tests can be skipped dynamically at runtime" -r Automake
:test-result: XFAIL tests can be skipped dynamically at runtime

I am not sure if this is the best we can do, see below.

In total, there are 8 built-in reporters. 3 of them are Catch2-specific

  • [x] console - I think this one is fine as-is, except for better colour choices. I will work on that later, because I also need to change how some of the colours are defined.
  • [ ] compact - We can definitely do better here. The compact reporter output has another related issue (#878) that could be tackled at the same time (but you don't have to).
  • [ ] XML - The overall XML structure cannot change, but we can extend it trivially. The goal of this reporter is to make the test results machine-parseable, so they need to mention skipping as well.

And 5 implement specific format we do not control. ~~Those might be an issue, because the output format might not support the concept of skipping tests.~~ Nope, they all support skipped tests.

horenmar avatar Oct 03 '22 12:10 horenmar

I also went to the original post (this is a very high quality PR btw) and

Consider adding a CLI option for whether skipped tests should be reported or not (similar to -s)

I don't think this is useful. While skips are not counted as errors, I think we should report them by default, as they are an indicator of potential issue. E.g. if the dynamic check for whether the test should be skipped is wrong, the user might end up with a test that never actually runs.

OTOH, there might be some use for an option that turns skipped tests into failures, instead of passes, for similar reasons.

horenmar avatar Oct 03 '22 12:10 horenmar

I started looking at the reporter formats.

TeamCity - I think the Ignored tests message matches up with the intent of SKIP, see https://www.jetbrains.com/help/teamcity/service-messages.html#Nested+Test+Reporting, Ctrl+F to "Ignored tests".


Went through all the 3rd party format reporters, and I am pleasantly surprised that all of them support concept of reporting skipped tests. See my previous post with list of reporters for notes.

horenmar avatar Oct 25 '22 19:10 horenmar

re the flag to turn SKIPs into failures, I thought about it and I think a new warning flag is the simplest (and cleanest) solution

horenmar avatar Oct 27 '22 19:10 horenmar

Rebased onto #2554. Apologies for being a bit slow on this right now, I will try to find some time to look into the other reporters.

psalz avatar Nov 03 '22 11:11 psalz

Don't worry, you will have to be waaaay slower before the PR has spent more time waiting on you than it has waiting on me.

horenmar avatar Nov 03 '22 21:11 horenmar

Do you have an idea about when you are going to have time to finish this? I am planning next release and wondering whether I should wait for this or not.

horenmar avatar Nov 04 '22 19:11 horenmar

Tough to say, I have a pretty busy month ahead unfortunately. Maybe I can find some time next weekend. I briefly looked into it yesterday and ran into a couple of issues where I need clarification:

  • As mentioned above, the Automake reporter currently uses the SKIP result to report test cases that were not run after reaching the --abortx limit. Other reporters don't use it but there are some comments talking about how this should be implemented. I propose to instead, in a breaking change, get rid of the IEventListener::skipTest hook. If a test run aborts prematurely, any remaining tests are simply not included in the report.
  • The JUnit, SonarQube and TeamCity reporters currently all use their "skipped" result type to report !mayfails. Since there is presumably no better option to express this, we'll have to use it for both !mayfail and explicit skips going forward.
  • While it doesn't seem to be too difficult to implement skipped tests into most of these reporters, how do I know whether the produced files are still valid? Is there some kind of infrastructure in place to validate that the format is actually correct? Or do I have to set up SonarQube and TeamCity installs etc...? (I'd rather not). I find it pretty ironic by the way that none of these platforms with their XML results don't seem to make use of the only redeeming quality of this awful format in that they provide a schema for validation.

psalz avatar Nov 07 '22 08:11 psalz

No problem, I'll pull in things I want to have in the next release and plan for this to be in a later one.

  • Sure, skipTest can be marked deprecated, and eventually removed. The removal will take a while though, as I don't plan to have another breaking release soon.
  • Yes. Honestly [!mayfail] is a weird attribute, because the test code runs, but the result is a failure only in the case of hard-crash. This then isn't semantically supported by many formats.
  • My approach to testing these is that if I can't find a validator with reasonable amount of effort, I just compare the output with the spec (if I can even find it, JUnit is terrible about this), and if it looks right, merge it and wait to see if someone complains. šŸ¤·

horenmar avatar Nov 07 '22 10:11 horenmar

Okay, back with some time to work on this! I've rebased onto current devel and added support for skipped tests into all reporters. Here's some notes on each:

  • Compact The compact reporter now also prints a line for each call to SKIP.
  • XML I've added a new boolean attribute skipped to each section's <OverallResults> as well as a skips count to each test cases' <OverallResult>. Note that a section that fails e.g. a CHECK and then SKIPs is counted as both failed and skipped. The top-level overall results now also report a skips count. Other than that, I've introduced a new <Skip> node, analogous to Failure etc.
  • Automake Automake has support for SKIP results, adding that was simple.
  • JUnit I found two different schemas for the JUnit format online. According to this one, there's only one <skipped> / <failure> / <error> allowed per <testcase>. However, since the current reporter produces multiple of them, I assume it instead follows this schema, which allows any number of <failure> and <error>. Unfortunately that one only allows at most one <skipped>, which is a problem if a test case is both skipped and marked as !mayfail. I've decided to not handle this explicitly, instead producing two <skipped> nodes - let's see if someone complains. The same is true for test cases that skip multiple times for different generator values. See for example the test case "dynamic skipping works with generators". The test case "failing in some unskipped sections causes entire test case to fail" behaves differently for the JUnit reporter, since individual sections are reported as separate test cases, so in this case we'll have one skipped and one failing "test". Btw, there seems to be an inconsistency here, because tests with generators still cause the "tests" count on <testsuite> to be incremented by one for each value, but only a single <testcase> is written.
  • SonarQube As mentioned before, SonarQube doesn't provide a proper schema for their format -- only an example -- and from what I can tell, the existing reporter is not conformant to that. No idea if there can be multiple <skipped> nodes per <testCase>, but analogous to JUnit, that's what we have now.
  • TAP TAP simply prints a list of all assertions, so for SKIPs I'm printing an ok result followed by the # SKIP directive.
  • TeamCity I'm using the testIgnored message, as suggested. I'm not sure however what it means when a test both has a testFailed and testIgnored message, i.e., whether the test will be considered as failed or ignored in that case. I'm currently outputting both when a test case has e.g. a failing CHECK followed by a SKIP (for example in "failed assertions before SKIP cause test case to fail").

psalz avatar Dec 21 '22 12:12 psalz

Nice, just yesterday I was thinking that when I find some time, I should finish this PR.

Some preliminary notes

  • For TeamCity, I don't see any of the testIgnored messages in the approval tests output. I have no idea what happens when a test both fails and is skipped either, but I think this can be solved later.
  • JUnit: There is already an open issue about multiple fails reported if there are multiple failing CHECKs in a test case. My answer for that was "I might fix it, but use REQUIREs in the meantime if you need it", and I think this is the correct answer here as well.
  • ~~XML reporter: the skipped attribute does not get written as bool, due to ScopedElement handling this badly. I'll fix it in the main branch and you can pick/rebase the commit here.~~ Nevermind on this one, I misread the approval tests.

horenmar avatar Dec 21 '22 22:12 horenmar

For TeamCity, I don't see any of the testIgnored messages in the approval tests output. I have no idea what happens when a test both fails and is skipped either, but I think this can be solved later.

Right, it seems that's due to the approval test script being a bit too overzealous in stripping out file paths though. You'll notice that the existing TeamCity baselines already don't include any testIgnored lines, or testFailed, for that matter.

psalz avatar Dec 22 '22 16:12 psalz

Hmmm, you are right. Should be fixed on devel now šŸ˜ƒ

horenmar avatar Jan 01 '23 19:01 horenmar

I looked at the SonarQube reporter. Do you think the current reporter does not obey the example outside the extended result tag?

It was originally contributed by someone working with SonarQube, so I assume that works šŸ¤·


Otherwise I think this is done after a rebase for the new approvals. (nevermind, docs need to be done first)

horenmar avatar Jan 01 '23 22:01 horenmar

I've rebased onto current devel and added some documentation. I've also deprecated IEventListener::skipTest by adding a comment to the function, as well as adding it to docs/deprecations.md. Should it also be deprecated in code, e.g. through a @deprecated comment, or even a [[deprecated]] attribute?

I've also reformatted Skip.tests.cpp to be in accordance with the new clang-format settings. I did not update the other code changes (e.g. to reporters), as the corresponding files all still seem to better match the old style.

I looked at the SonarQube reporter. Do you think the current reporter does not obey the example outside the extended result tag?

No I think then it's fine, I was referring to the extended result tag.

psalz avatar Jan 09 '23 15:01 psalz

I think this is done. I wrote some follow ups into my TODO list, e.g.

Should it also be deprecated in code, e.g. through a @deprecated comment, or even a [[deprecated]] attribute?

Yes, but it needs to be configurable, so the deprecation can be disabled via preprocessor, letting the users deal with this when they can. I'd prefer that to be done later, especially since there is more stuff in Catch2 that will get this treatment.


Do you want to clean up the commits, or are you fine with me squashing the PR?

horenmar avatar Jan 11 '23 09:01 horenmar

Fine with me, squash away! :)

psalz avatar Jan 12 '23 13:01 psalz