mutmut icon indicating copy to clipboard operation
mutmut copied to clipboard

Non-tested statements can be reported as "survived" (🙁) instead of "no tests" (🫥)

Open Otto-AA opened this issue 6 months ago • 3 comments

Based on https://github.com/boxed/mutmut/issues/389#issuecomment-2927829497

For a setup like this:

# my_lib
def hello() -> str:
    return "Hello from my-lib!"
    print('hello')

# test_my_lib
from my_lib import hello

def test_hello():
    assert hello() == "Hello from my-lib!"

We would create mutations for the print('hello') and report them as survived. However, this statement is (and cannot) be executed by the tests, so it should be reported as "no tests".

The cause of this issue is that we collect coverage only on a function level, not on a statement/expression level. So currently, with the stats collected by _mutmut_trampoline we do know that hello() has been executed, but we do not know which statements inside of hello have been executed.

Otto-AA avatar Jun 20 '25 14:06 Otto-AA

I currently see two approaches:

  1. use a better coverage mechanism (e.g. use coverage.py instead of our custom coverage-tracking)
  2. do not mutate likely unreachable code (assert False statements, assert_never(), maybe more)

The first one seems better, as it is not based on a heuristic that defines which code is untested. but is likely more effort to implement.

Otto-AA avatar Jun 20 '25 15:06 Otto-AA

The reason I implemented the custom method to know which functions are called instead of using coverage is that coverage used to be slow. I think that's no longer the case with the latest updates. Although some test to verify might be in order.

boxed avatar Jun 21 '25 07:06 boxed

Yes, we should test performance on some repositories before merging a switch to coverage.py. I would guess that it reduces the overall runtime in projects that do not have 100% coverage, because we can skip more untested mutants and thus skip multiple complete test runs. But let's see how it works out in practice.

I don't think I have time to work on this in the next weeks, so if someone else wants to give it a shot, go for it! Some starting pointers:

  • currently we collect stats here: https://github.com/boxed/mutmut/blob/2710f93fbd24c10b1a988114d9b286d5265bc28c/mutmut/main.py#L712
  • while running the tests, when a mutated function is executed in the stats collection phase, we record that it was executed here: https://github.com/boxed/mutmut/blob/2710f93fbd24c10b1a988114d9b286d5265bc28c/mutmut/trampoline_templates.py#L61
  • in the future, we likely want to (but feel free to suggest an alternative implementation):
    • remove the elif mutant_under_test == 'stats': special case
    • start coverage.py before the test run and stop it afterwards (https://coverage.readthedocs.io/en/latest/api.html; probably with dynamic contexts to track which test executed which mutants). Maybe we would need to run it on the non-mutated code, so we can easily cross-reference the line numbers with the mutants.
    • somehow map the covered lines/statements/branches to mutants and store it in the stats
    • somehow map the mutants to the tests that execute them (so we can later on only run a subset of tests for each mutant)

Otto-AA avatar Jun 21 '25 10:06 Otto-AA

I am going to try and implement this because I have the same issue. I have low coverage on my project with many functions covered, but not every line in them. I don't want those (uncovered) lines to be mutated. Not just for speed, but mostly because the report on them is useless to me (and they dominate the report)

I will use the guide above -- if there's anything more you'd like to add, let me know.

loufranco avatar Sep 09 '25 15:09 loufranco

Thanks for taking a look at this!

Something I'm not sure about yet is whether coverage.py can be used to get the stack depth (for the max_stack_depth config option). Currently we track this here: https://github.com/boxed/mutmut/blob/2710f93fbd24c10b1a988114d9b286d5265bc28c/mutmut/main.py#L131

If we cannot implement max_stack_depth with coverage.py, we probably want to keep the current stats collection in addition to an (optional?) coverage.py tracking. I think it would still be good to fill tests_by_mangled_function_name based on the more precise coverage results, but I'm not 100% sure this is necessary.

Also, I'm not sure whether we should run coverage.py on the original source code or the already mutated one. I would tend to running it before mutating. If we would run it afterwards, (1) the mapping of covered line number to mutant name could be tricky, and (2) we generate useless mutants. See ARCHITECTURE.rst for info on the current phases.

And another pointer: we already use line numbers to honour the # pragma: no mutate comments. The logic to skip mutants based on the coverage could be similar, except that you need to read the data from some coverage report.

Otto-AA avatar Sep 09 '25 16:09 Otto-AA

If this gets too complex, I would suggest to simply use the results of coverage.py like an automatic # pragma: no mutate on the uncovered lines, and keep the internals of mutmut like they are (eg don't touching the stats collection).

I think that would already be a good improvement and easier to implement :)

Otto-AA avatar Sep 09 '25 16:09 Otto-AA

That makes a lot of sense to me (the automatic pragma: no mutate based on running coverage on the original source).

loufranco avatar Sep 09 '25 20:09 loufranco

I made #438 to address this. It works on my project (and tests), but let me know if there are things I need to incorporate

loufranco avatar Sep 11 '25 15:09 loufranco