coveragepy Inconsistent report when combining reports across Python versions

Inconsistent report when combining reports across Python versions

Open freakboy3742 opened this issue 1 year ago • 4 comments

Describe the bug

Combining reports across different versions of Python yields inconsistent coverage results, depending on the version of Python that was used to produce the report.

To Reproduce

I've tried, but have been unsuccessful at reducing this to a minimal example; building an isolated version of the lines that are the source of the problem makes the problem go away. It manifests running the test suite of Briefcase on this PR, at commit 6ae1b86; it's a Pytest test suite of a pure Python codebase, using unittest.mock and monkeypatch. In CI we run coverage on Python 3.8-3.12, on macOS, Windows and Linux; however, the problem manifests with just Python3.9.13 and Python3.10.9 in the mix.

To reproduce on macOS (I'm seeing the same result on Ventura on M1, and Monterey on x86_64):

$ git clone https://github.com/freakboy3742/briefcase.git
$ cd briefcase
$ git checkout 6ae1b86
$ python3.X -m venv venv   # This python version is the factor controlling the bug.
$ source ./venv/bin/activate
(venv) $ pip install tox==4.4.6 "coverage[toml]==7.2.1"
(venv) $ tox -e py39,py310
(venv) $ coverage combine
(venv) $ coverage report

If you run this on Python3.9, it reports:

(venv3.9) rkm@eunectes briefcase % coverage report
Name                                              Stmts   Miss Branch BrPart  Cover   Missing
---------------------------------------------------------------------------------------------
src/briefcase/commands/base.py                      287      2     60      2  98.8%   184, 218
src/briefcase/commands/create.py                    346      1     98      1  99.5%   838
src/briefcase/commands/dev.py                        69      1     20      1  97.8%   151
src/briefcase/commands/open.py                       29      2     10      2  89.7%   14, 18
src/briefcase/integrations/android_sdk.py           504      0    159      2  99.7%   61->64, 312->318
src/briefcase/platforms/linux/flatpak.py            102      1      8      0  99.1%   31
src/briefcase/platforms/macOS/__init__.py           242      2     86      0  99.4%   22-25
src/briefcase/platforms/windows/__init__.py          74      2      2      0  97.4%   125-126
src/briefcase/platforms/windows/app.py               43      3      0      0  93.0%   21, 40-41
src/briefcase/platforms/windows/visualstudio.py      43      3      0      0  93.0%   21, 40-41
---------------------------------------------------------------------------------------------
TOTAL                                              5101     17   1293      8  99.6%

47 files skipped due to complete coverage.

However, if you run on Python3.10, it reports:

(venv3.10) rkm@eunectes briefcase % coverage report
Name                                              Stmts   Miss Branch BrPart  Cover   Missing
---------------------------------------------------------------------------------------------
src/briefcase/commands/base.py                      287      2     64      2  98.9%   184, 218
src/briefcase/commands/create.py                    346      1    120      1  99.6%   838
src/briefcase/commands/dev.py                        69      1     22      1  97.8%   151
src/briefcase/commands/open.py                       29      2     10      2  89.7%   14, 18
src/briefcase/integrations/android_sdk.py           504      0    177      2  99.7%   61->64, 312->278
src/briefcase/platforms/linux/flatpak.py            102      1     16      0  99.2%   31
src/briefcase/platforms/linux/system.py             300      0    136      1  99.8%   192->202
src/briefcase/platforms/macOS/__init__.py           242      2     96      0  99.4%   22-25
src/briefcase/platforms/windows/__init__.py          74      2      8      0  97.6%   125-126
src/briefcase/platforms/windows/app.py               43      3      2      0  93.3%   21, 40-41
src/briefcase/platforms/windows/visualstudio.py      43      3      2      0  93.3%   21, 40-41
---------------------------------------------------------------------------------------------
TOTAL                                              5104     17   1516      9  99.6%

46 files skipped due to complete coverage.

Note the 3.10 version has an additional missing branch:

src/briefcase/platforms/linux/system.py             300      0    136      1  99.8%   192->202

The problematic code is this (system.py: L187-206)

        else:
             if sys.version_info >= (3, 10):
                 freedesktop_info = self.tools.platform.freedesktop_os_release()
             else:
                 with Path("/etc/os-release").open(encoding="utf-8") as f:
                     freedesktop_info = parse_freedesktop_os_release(f.read())

         # Process the FreeDesktop content to give the vendor, codename and vendor base.
         (
             app.target_vendor,
             app.target_codename,
             app.target_vendor_base,
         ) = self.vendor_details(freedesktop_info)

The test suite is hitting these lines; if you run a coverage reports on just 3.9, you see L190 missing; if you run a report on just 3.10, you get L192-193 missing (as expected). It's only when the two reports are combined that the "L192->202" branch is apparently uncovered.

Expected behavior

Coverage reports shouldn't be dependent on the Python version used to generate them.

Additional context

Interestingly, if you use Python3.10 to report on just 3.9 coverage (i.e., using the script above, use Py3.10 to generate the venv, but only run tox -e py39, you get a really weird report:

Combined data file .coverage.eunectes.local.54651.279508
Name                                              Stmts   Miss Branch BrPart  Cover   Missing
---------------------------------------------------------------------------------------------
src/briefcase/__main__.py                            42      0     13      2  96.4%   45->48, 45->exit
src/briefcase/commands/base.py                      287      4     64      5  97.4%   20-21, 184, 218, 326->328, 698->exit, 714->698
src/briefcase/commands/create.py                    346      1    120     15  96.6%   54->56, 56->61, 61->78, 78->exit, 284->exit, 395->exit, 396->395, 397->396, 398->396, 465->exit, 541->540, 622->exit, 717->exit, 718->717, 838
src/briefcase/commands/dev.py                        69      1     22      2  96.7%   72->exit, 151
src/briefcase/commands/new.py                       129      1     30      0  99.4%   239
src/briefcase/commands/open.py                       29      2     10      2  89.7%   14, 18
src/briefcase/console.py                            225      1     70      4  98.3%   267->exit, 273->280, 301->314, 307->301, 518
src/briefcase/integrations/android_sdk.py           504      0    177     16  97.7%   61->64, 278->318, 312->278, 313->278, 606->exit, 981->1003, 1003->exit, 1025->1033, 1026->1025, 1048->exit, 1049->1048, 1111->1143, 1114->1111, 1144->1173, 1145->1144, 1158->1162
src/briefcase/integrations/docker.py                115      0     36      2  98.7%   305->exit, 420->exit
src/briefcase/integrations/download.py               66      0     24      4  95.6%   113->128, 120->113, 121->120, 146->exit
src/briefcase/integrations/flatpak.py                79      0     18      1  99.0%   233->exit
src/briefcase/integrations/java.py                  102      0     34      2  98.5%   247->exit, 274->exit
src/briefcase/integrations/linuxdeploy.py           178      0     32      5  97.6%   66->exit, 68->66, 106->exit, 123->exit, 143->exit
src/briefcase/integrations/rcedit.py                 40      0     10      1  98.0%   67->exit
src/briefcase/integrations/subprocess.py            243      1     93      4  98.5%   130->exit, 439->443, 441->439, 612
src/briefcase/integrations/wix.py                    75      0     26      2  98.0%   134->151, 155->exit
src/briefcase/platforms/__init__.py                  12      2      4      0  87.5%   8-9
src/briefcase/platforms/android/gradle.py           161      0     40      9  95.5%   210->exit, 211->210, 232->exit, 336->340, 340->344, 344->362, 354->344, 394->exit, 412->exit
src/briefcase/platforms/iOS/xcode.py                197      0     57     11  95.7%   270->exit, 272->277, 280->270, 293->exit, 368->375, 369->368, 377->387, 391->410, 413->424, 425->449, 474->486
src/briefcase/platforms/linux/__init__.py            72      0     28      1  99.0%   120->115
src/briefcase/platforms/linux/appimage.py           115      0     18      1  99.2%   194->exit
src/briefcase/platforms/linux/flatpak.py            102      1     16      4  95.8%   31, 134->exit, 163->172, 180->exit, 239->exit
src/briefcase/platforms/linux/system.py             300      1    136     22  94.7%   177->182, 190, 192->202, 245->247, 440->459, 469->483, 483->503, 486->483, 513->527, 516->513, 528->548, 529->528, 548->exit, 668->706, 686->668, 706->exit, 742->755, 760->839, 761->760, 836->761, 839->847, 847->867
src/briefcase/platforms/macOS/__init__.py           242      2     96      7  97.3%   22-25, 172->exit, 185->exit, 366->exit, 367->366, 444->459, 446->444, 447->446
src/briefcase/platforms/macOS/xcode.py               49      0      4      1  98.1%   76->exit
src/briefcase/platforms/web/static.py               176      0     68     18  92.6%   79->87, 80->79, 87->exit, 88->87, 100->exit, 101->100, 127->130, 132->152, 152->173, 173->203, 174->173, 204->220, 215->204, 217->215, 356->362, 388->exit, 389->388, 390->389
src/briefcase/platforms/windows/__init__.py          74      2      8      3  93.9%   125-126, 137->166, 168->189, 191->exit
src/briefcase/platforms/windows/app.py               43      3      2      1  91.1%   21, 40-41, 50->exit
src/briefcase/platforms/windows/visualstudio.py      43      3      2      1  91.1%   21, 40-41, 50->exit
---------------------------------------------------------------------------------------------
TOTAL                                              5104     25   1516    146  97.4%

This report includes the "missing" 192->202 branch, along with many others. A quick survey of the missing branches on system.py shows that they are all context manager with clauses; however, they're not reported as missing branches if the report is generated on Python3.9.

My initial (mostly uneducated) guess is that the logic on Python3.10 for evaluating the list of branches is different to that on Python3.9 (possibly due to the change in context managers that allows for multiple context declarations in a single with statement?). Most of these "extra" missing branches are covered when the test suite is run on Python3.10; but the problem 192->202 branch is version specific, it won't ever run on Python3.10, and so the Python3.10 report sees missing coverage.

Mar 05 '23 04:03 freakboy3742

Seeing the same thing. Tried and failed at getting a repro. As soon as I reduced it to a single file using with statements and a test file, the problem goes away. Also threw xdist in the mix since that's where we are seeing this problem manifest. I even generated 10,000 fake tests and ran them all in parallel to try and rule out xdist as a culprit.

Failed reproducer in the details.

deps

coverage==6.4.2
pytest==7.1.2
pytest-cov==4.0.0
pytest-forked==1.4.0
pytest-xdist==3.2.1

coverage erase
pytest ./test_f.py \
    --cov \
    --cov-config=setup.cfg

./py310/Scripts/coverage report --rcfile=setup.cfg -m

Coverage config in the setup.cfg

[coverage:run]
source =
    f 
branch = true
parallel = true
concurrency = multiprocessing
sigterm = true

[coverage:report]
# Coverage report configs.
show_missing = true
# skip_covered = false
fail_under = 100
precision = 2
sort = Miss
exclude_lines =
    pragma: nocover
    except ImportError:
    except NameError:
    if __name__ == .__main__.:

[coverage:paths]
source =.

File f.py

def f():
    with open("hello", "w") as f:
        f.write("world")
    os.getenv("HELLO")
    with open("hello") as f:
        thing = f.read()
    return thing


def g():
    if 1 == 1:
        return f()


def what():
    if 1 == 0:
        return g()
    else:
        return f()

test_f.py

import f


def test_f():
    assert f.f() == "world"


def test_g():
    assert f.g() == "world"

# and then 10,000 of the same thing
def test():
    assert 1 == 1


def another():
    assert 2 == 2

Apr 06 '23 02:04 jmahlik

Attempting to triage this ticket at the pycon 2023 sprints

Apr 24 '23 22:04 paxnovem

Working with @paxnovem and @marcgibbons at PyCon 2023 and we were finally able to get this down to a consistently reproducible test case! 🎉

example.py:

from contextlib import nullcontext

def foo():
    with nullcontext():
        pass

foo()

shell

python3.11 coverage erase
python3.8 coverage run example.py
python3.11 coverage report

Output:

Name         Stmts   Miss Branch BrPart  Cover
----------------------------------------------
example.py       5      0      2      1    86%
----------------------------------------------
TOTAL            5      0      2      1    86%

When coverage is run and the report is generated on the same version (Python 3.8 + Python 3.8 or Python 3.11 + Python 3.11), the issue does not occur and the coverage comes back as 100%. The issue appears to only occur using a version before Python 3.9 to run the tracer and a version of 3.10 and after the run the report.

Apr 25 '23 22:04 kevin-brown

Tracing through code and how this all performs, we were able to determine the cause of this issue. When running coverage, the tracer only stores the captures lines and branches (arcs) that were executed during the run. When reporting on coverage, the reporter takes the captured traced lines and branches (arcs) that were executed and compares it to the parsed AST to determine what lines and branches existed to be executed. The bug currently occurs because of a difference in the number of branches that are detected from the AST in Python 3.11, where there are 2 additional branches added for with statements in Python 3.11 vs Python 3.8. You can verify this in the reporting output between the two versions.

example.py

from contextlib import nullcontext
import sys

def foo():
    if sys.version_info < (3, 10):
        with nullcontext():
            pass

foo()

Shell:

$ python3.8 -m coverage run example.py
$ python3.8 -m coverage report
Name         Stmts   Miss Branch BrPart  Cover
----------------------------------------------
example.py       7      0      2      1    89%
----------------------------------------------
TOTAL            7      0      2      1    89%
$ python3.11 -m coverage run example.py
$ python3.11 -m coverage report
Name         Stmts   Miss Branch BrPart  Cover
----------------------------------------------
example.py       7      2      4      1    55%
----------------------------------------------
TOTAL            7      2      4      1    55%
$ python3.11 -m coverage combine .coverage-py311 .coverage-py38 
Combined data file .coverage-py38
$ python3.8 -m coverage report
Name         Stmts   Miss Branch BrPart  Cover
----------------------------------------------
example.py       7      0      2      0   100%
----------------------------------------------
TOTAL            7      0      2      0   100%
$ python3.11 -m coverage report
Name         Stmts   Miss Branch BrPart  Cover
----------------------------------------------
example.py       7      0      4      1    91%
----------------------------------------------
TOTAL            7      0      4      1    91%

Looking at the "Branch" column you can see the 2 new branches in Python 3.11 that are not present in Python 3.8. Additionally, when the reports are combined together you can see the 1 missing branch for Python 3.11 (the second one for the with) but you also see that Python 3.8 treats it as a fully covered file.

My recommendation would be to always run the coverage report on the lowest version instead of on the highest version in order to avoid missing phantom branches.

Apr 25 '23 23:04 kevin-brown

coveragepy coveragepy copied to clipboard

Inconsistent report when combining reports across Python versions

coveragepy
coveragepy copied to clipboard