stryker-net icon indicating copy to clipboard operation
stryker-net copied to clipboard

Add dedicated testing workflow for Roslyn sourcecode generators

Open psfinaki opened this issue 2 years ago • 12 comments

Describe the bug So we heavily use source generation in our projects and we also want to mutation test the generators themselves. Our generators are very complex and part of their logic is correctly mutation tested. However, the emission part is suffering.

See the repro: Repro.zip. The generator looks like this: image

The tests we have basically just try to invoke source generation and verify that it compiles.

However, for any of the mutants above the compilation would fail miserably. E.g. like this for the second mutant: image

Expected behavior I think these mutants should be marked as compile errors. Maybe they can be marked as just killed, eventually it's the presence of tests that kills them. Either way, they shouldn't survive.

Desktop

  • OS: Windows
  • Type of project: .NET Core
  • Framework Version: .NET 6, .NET Standard 2.0
  • Stryker Version: 1.5.0

psfinaki avatar Apr 08 '22 13:04 psfinaki

I hope you realize how complex this is likely to be :p

If I understand you correctly the mutation in the generator is causing compilation errors in the emitted code, right? So we would have to be able to figure out that a piece of code was generated by a specific generator, and then roll back mutations in the generator. The issue is not so much marking the mutations as killed, but figuring out how to remove the bad mutation so we can actually successfully compile.

rouke-broersma avatar Apr 08 '22 13:04 rouke-broersma

To add a bit more color:

  • We have custom source generators that are both built and used as part of our project.
  • We have two kinds of test suites for the generators:
    1. Classic unit tests that exist primarily to test error handling in the generator logic
    2. Generated code tests that exist to ensure that the generated code has the expected semantics. This ends up being implemented by having a test project that uses the code generator to build itself, and then we run tests against the generated code within the project.

We don't have "golden files" tests which ensure that a given input always results in the same generated code. We instead are testing the semantics of the generated code, which is more meaningful to us in general (who cares if the formatting of the generator code changes, for example).

In terms of code coverage, we achieve 100% total coverage here in two ways:

  • We have 100% coverage of the generator code through the unit tests, and by counting the use of the generator that produced the generated test code.
  • We cover 100% of the generated code with their own unit tests.

Apologies if this is a bit hard to follow. It was also hard to write the code in the first place :-)

geeknoid avatar Apr 08 '22 13:04 geeknoid

Makes perfect sense and I think mutations are very useful in this context!

rouke-broersma avatar Apr 08 '22 14:04 rouke-broersma

Actually, this is not a bug, but a feature request. And quite a significant one. Your testing workflow is very specific as it implies building the test project for each test, which is non supported by the dynamic mutation switching strategy Stryker uses. Here, the mutation workflow should be:

  • generate mutation (for the generator)
  • build the generator
  • foreach mutation of the generator
    • build the test assembly
    • mark the mutant as killed if the test assembly cannot be built
    • run the test
    • mark the mutant as killed if the test fails

This is doable, but building the test assembly with the mutated generator may prove to be tricky. But I am afraid the main problem will be the run time: test coverage makes no sense here + there is a build test for each test run. you are looking at multi hours test sessions, up to multiple days, depending on the complexity of the code generator(s)

I forgot: we also need to adjust the mutation control logic for this, probably going back to the original environment variable.

dupdob avatar Apr 08 '22 14:04 dupdob

@dupdob right, source generators are ran at compile time and not at runtime so the mutation is in essence in a static context because the generator with the next mutation won't be run until the recompilation... I didn't realize that before.

Perhaps we should skip mutating source generators by default, until such time we have the capability to run the workflow you suggest. Because they will potentially influence the mutation test result in unpredictable ways.

rouke-broersma avatar Apr 08 '22 14:04 rouke-broersma

As of now, Stryker does support mutation testing for generator, assuming those are tested with the classical test workflow. This can be done via Roslyn test helper classes. But, I recognize this testing feels awkward. if you use only the build based workflow, Stryker will report than no mutant is covered by any test (and indeed, this is what happens with the repro project, but coverage analysis is disabled via conf as an attempt to fix the core problem).

That being said, we should probably issue a warning when the project to be mutated is reported as an analyzer dependency (and not a normal one). The warning should state that this is not supported (yet??) and results will not match expectations.

Also, I have been thinking for a while about another feature: Stryker could raise a warning when a mutant was not executed during its test phase (reusing the coverage logic). When this happens, it means there is some dynamic change that happens at build and/or run time that we cannot control and likely results in flawed results.

dupdob avatar Apr 08 '22 15:04 dupdob

Yes, this issue finally made me look deeply in this dynamic mutation switching strategy Stryker implements :D And I agree it can be classified as a feature request.

Also, I have been thinking for a while about another feature: Stryker could raise a warning when a mutant was not executed during its test phase (reusing the coverage logic). When this happens, it means there is some dynamic change that happens at build and/or run time that we cannot control and likely results in flawed results.

I like this idea - maybe such a mutant could be ignored by default? In general, this can be viewed in the light of Stryker's engine evolution. We cannot handle these ones right now so at least we shouldn't distort the final mutation score. One day we'll implement the flow described above, the mutants will be handled and the engine will naturally improve - and whoever moves to the version with the improved engine will have to deal with the new mutation score being received.

psfinaki avatar Apr 08 '22 15:04 psfinaki

I have an idea that would likely address the problem.

I mentioned we have the tests to validate that the generated code builds and runs with the right semantics. I suggest we use this to bootstrap a new test suite. Here's how this would work:

  1. Run the existing generator tests. As part of running the generator, the build emits the generated files in the obj directory.

  2. Create a new test suite which consumes the files generated above and uses those as golden files.

  3. The new test suite reads the source classes used in step 1 above, runs the generator as part of the test suite, and then compares the output with the files read in step 2

So any mutation to the generator code that affects the output the generator produces will be detected. And we're still not manually keeping golden files up to date, and we can easily tolerate simple (intentional) variation in the output over time.

geeknoid avatar Apr 08 '22 16:04 geeknoid

Just confirming that the strategy described above worked for us, we now have decent mutation scores for generators as well!

psfinaki avatar Apr 25 '22 11:04 psfinaki

Should we in that case change this feature request to writing some documentation on testing with source generators? Or would dedicated support for source generators in stryker be better than your current workflow?

rouke-broersma avatar Apr 25 '22 11:04 rouke-broersma

Hard to say. Theoretically it's not "clean" as it requires quite some effort on the client side. But then it's not very "dirty" as once implementing this effort one wouldn't need to update some golden files with each PR, although there still might be some hardcoded things in the code.

At the same time, the "clean" workflow, without the client effort, would probably make the execution dramatically longer and not everyone is going to be happy about that. So I don't know :D

psfinaki avatar Apr 25 '22 12:04 psfinaki

I'd be satisfied with docs. There are only so many code generators in the world. Although it was hard to figure out how to do this testing, in the end it wasn't hard to implement. So if you provide the how-to, that should be enough IMO.

geeknoid avatar Apr 25 '22 13:04 geeknoid

Personally I'm opt-in for even longer compilation times if that would be opt-in. Even via some "hack" because I want use stryker for checking completeness of tests, so waiting of long period of time maybe justified. Because Stryker appears to be operating on source code, indirectly (from my short observation) this leads to issues like #1923 which sometimes completely block the testing.

kant2002 avatar Oct 24 '22 08:10 kant2002