coveragepy
coveragepy copied to clipboard
Inconsistent report when combining reports across Python versions
Describe the bug
Combining reports across different versions of Python yields inconsistent coverage results, depending on the version of Python that was used to produce the report.
To Reproduce
I've tried, but have been unsuccessful at reducing this to a minimal example; building an isolated version of the lines that are the source of the problem makes the problem go away. It manifests running the test suite of Briefcase on this PR, at commit 6ae1b86; it's a Pytest test suite of a pure Python codebase, using unittest.mock and monkeypatch. In CI we run coverage on Python 3.8-3.12, on macOS, Windows and Linux; however, the problem manifests with just Python3.9.13 and Python3.10.9 in the mix.
To reproduce on macOS (I'm seeing the same result on Ventura on M1, and Monterey on x86_64):
$ git clone https://github.com/freakboy3742/briefcase.git
$ cd briefcase
$ git checkout 6ae1b86
$ python3.X -m venv venv # This python version is the factor controlling the bug.
$ source ./venv/bin/activate
(venv) $ pip install tox==4.4.6 "coverage[toml]==7.2.1"
(venv) $ tox -e py39,py310
(venv) $ coverage combine
(venv) $ coverage report
If you run this on Python3.9, it reports:
(venv3.9) rkm@eunectes briefcase % coverage report
Name Stmts Miss Branch BrPart Cover Missing
---------------------------------------------------------------------------------------------
src/briefcase/commands/base.py 287 2 60 2 98.8% 184, 218
src/briefcase/commands/create.py 346 1 98 1 99.5% 838
src/briefcase/commands/dev.py 69 1 20 1 97.8% 151
src/briefcase/commands/open.py 29 2 10 2 89.7% 14, 18
src/briefcase/integrations/android_sdk.py 504 0 159 2 99.7% 61->64, 312->318
src/briefcase/platforms/linux/flatpak.py 102 1 8 0 99.1% 31
src/briefcase/platforms/macOS/__init__.py 242 2 86 0 99.4% 22-25
src/briefcase/platforms/windows/__init__.py 74 2 2 0 97.4% 125-126
src/briefcase/platforms/windows/app.py 43 3 0 0 93.0% 21, 40-41
src/briefcase/platforms/windows/visualstudio.py 43 3 0 0 93.0% 21, 40-41
---------------------------------------------------------------------------------------------
TOTAL 5101 17 1293 8 99.6%
47 files skipped due to complete coverage.
However, if you run on Python3.10, it reports:
(venv3.10) rkm@eunectes briefcase % coverage report
Name Stmts Miss Branch BrPart Cover Missing
---------------------------------------------------------------------------------------------
src/briefcase/commands/base.py 287 2 64 2 98.9% 184, 218
src/briefcase/commands/create.py 346 1 120 1 99.6% 838
src/briefcase/commands/dev.py 69 1 22 1 97.8% 151
src/briefcase/commands/open.py 29 2 10 2 89.7% 14, 18
src/briefcase/integrations/android_sdk.py 504 0 177 2 99.7% 61->64, 312->278
src/briefcase/platforms/linux/flatpak.py 102 1 16 0 99.2% 31
src/briefcase/platforms/linux/system.py 300 0 136 1 99.8% 192->202
src/briefcase/platforms/macOS/__init__.py 242 2 96 0 99.4% 22-25
src/briefcase/platforms/windows/__init__.py 74 2 8 0 97.6% 125-126
src/briefcase/platforms/windows/app.py 43 3 2 0 93.3% 21, 40-41
src/briefcase/platforms/windows/visualstudio.py 43 3 2 0 93.3% 21, 40-41
---------------------------------------------------------------------------------------------
TOTAL 5104 17 1516 9 99.6%
46 files skipped due to complete coverage.
Note the 3.10 version has an additional missing branch:
src/briefcase/platforms/linux/system.py 300 0 136 1 99.8% 192->202
The problematic code is this (system.py: L187-206)
else:
if sys.version_info >= (3, 10):
freedesktop_info = self.tools.platform.freedesktop_os_release()
else:
with Path("/etc/os-release").open(encoding="utf-8") as f:
freedesktop_info = parse_freedesktop_os_release(f.read())
# Process the FreeDesktop content to give the vendor, codename and vendor base.
(
app.target_vendor,
app.target_codename,
app.target_vendor_base,
) = self.vendor_details(freedesktop_info)
The test suite is hitting these lines; if you run a coverage reports on just 3.9, you see L190 missing; if you run a report on just 3.10, you get L192-193 missing (as expected). It's only when the two reports are combined that the "L192->202" branch is apparently uncovered.
Expected behavior
Coverage reports shouldn't be dependent on the Python version used to generate them.
Additional context
Interestingly, if you use Python3.10 to report on just 3.9 coverage (i.e., using the script above, use Py3.10 to generate the venv, but only run tox -e py39
, you get a really weird report:
Combined data file .coverage.eunectes.local.54651.279508
Name Stmts Miss Branch BrPart Cover Missing
---------------------------------------------------------------------------------------------
src/briefcase/__main__.py 42 0 13 2 96.4% 45->48, 45->exit
src/briefcase/commands/base.py 287 4 64 5 97.4% 20-21, 184, 218, 326->328, 698->exit, 714->698
src/briefcase/commands/create.py 346 1 120 15 96.6% 54->56, 56->61, 61->78, 78->exit, 284->exit, 395->exit, 396->395, 397->396, 398->396, 465->exit, 541->540, 622->exit, 717->exit, 718->717, 838
src/briefcase/commands/dev.py 69 1 22 2 96.7% 72->exit, 151
src/briefcase/commands/new.py 129 1 30 0 99.4% 239
src/briefcase/commands/open.py 29 2 10 2 89.7% 14, 18
src/briefcase/console.py 225 1 70 4 98.3% 267->exit, 273->280, 301->314, 307->301, 518
src/briefcase/integrations/android_sdk.py 504 0 177 16 97.7% 61->64, 278->318, 312->278, 313->278, 606->exit, 981->1003, 1003->exit, 1025->1033, 1026->1025, 1048->exit, 1049->1048, 1111->1143, 1114->1111, 1144->1173, 1145->1144, 1158->1162
src/briefcase/integrations/docker.py 115 0 36 2 98.7% 305->exit, 420->exit
src/briefcase/integrations/download.py 66 0 24 4 95.6% 113->128, 120->113, 121->120, 146->exit
src/briefcase/integrations/flatpak.py 79 0 18 1 99.0% 233->exit
src/briefcase/integrations/java.py 102 0 34 2 98.5% 247->exit, 274->exit
src/briefcase/integrations/linuxdeploy.py 178 0 32 5 97.6% 66->exit, 68->66, 106->exit, 123->exit, 143->exit
src/briefcase/integrations/rcedit.py 40 0 10 1 98.0% 67->exit
src/briefcase/integrations/subprocess.py 243 1 93 4 98.5% 130->exit, 439->443, 441->439, 612
src/briefcase/integrations/wix.py 75 0 26 2 98.0% 134->151, 155->exit
src/briefcase/platforms/__init__.py 12 2 4 0 87.5% 8-9
src/briefcase/platforms/android/gradle.py 161 0 40 9 95.5% 210->exit, 211->210, 232->exit, 336->340, 340->344, 344->362, 354->344, 394->exit, 412->exit
src/briefcase/platforms/iOS/xcode.py 197 0 57 11 95.7% 270->exit, 272->277, 280->270, 293->exit, 368->375, 369->368, 377->387, 391->410, 413->424, 425->449, 474->486
src/briefcase/platforms/linux/__init__.py 72 0 28 1 99.0% 120->115
src/briefcase/platforms/linux/appimage.py 115 0 18 1 99.2% 194->exit
src/briefcase/platforms/linux/flatpak.py 102 1 16 4 95.8% 31, 134->exit, 163->172, 180->exit, 239->exit
src/briefcase/platforms/linux/system.py 300 1 136 22 94.7% 177->182, 190, 192->202, 245->247, 440->459, 469->483, 483->503, 486->483, 513->527, 516->513, 528->548, 529->528, 548->exit, 668->706, 686->668, 706->exit, 742->755, 760->839, 761->760, 836->761, 839->847, 847->867
src/briefcase/platforms/macOS/__init__.py 242 2 96 7 97.3% 22-25, 172->exit, 185->exit, 366->exit, 367->366, 444->459, 446->444, 447->446
src/briefcase/platforms/macOS/xcode.py 49 0 4 1 98.1% 76->exit
src/briefcase/platforms/web/static.py 176 0 68 18 92.6% 79->87, 80->79, 87->exit, 88->87, 100->exit, 101->100, 127->130, 132->152, 152->173, 173->203, 174->173, 204->220, 215->204, 217->215, 356->362, 388->exit, 389->388, 390->389
src/briefcase/platforms/windows/__init__.py 74 2 8 3 93.9% 125-126, 137->166, 168->189, 191->exit
src/briefcase/platforms/windows/app.py 43 3 2 1 91.1% 21, 40-41, 50->exit
src/briefcase/platforms/windows/visualstudio.py 43 3 2 1 91.1% 21, 40-41, 50->exit
---------------------------------------------------------------------------------------------
TOTAL 5104 25 1516 146 97.4%
This report includes the "missing" 192->202 branch, along with many others. A quick survey of the missing branches on system.py
shows that they are all context manager with
clauses; however, they're not reported as missing branches if the report is generated on Python3.9.
My initial (mostly uneducated) guess is that the logic on Python3.10 for evaluating the list of branches is different to that on Python3.9 (possibly due to the change in context managers that allows for multiple context declarations in a single with
statement?). Most of these "extra" missing branches are covered when the test suite is run on Python3.10; but the problem 192->202
branch is version specific, it won't ever run on Python3.10, and so the Python3.10 report sees missing coverage.
Seeing the same thing. Tried and failed at getting a repro. As soon as I reduced it to a single file using with statements and a test file, the problem goes away. Also threw xdist in the mix since that's where we are seeing this problem manifest. I even generated 10,000 fake tests and ran them all in parallel to try and rule out xdist as a culprit.
Failed reproducer in the details.
deps
coverage==6.4.2
pytest==7.1.2
pytest-cov==4.0.0
pytest-forked==1.4.0
pytest-xdist==3.2.1
coverage erase
pytest ./test_f.py \
--cov \
--cov-config=setup.cfg
./py310/Scripts/coverage report --rcfile=setup.cfg -m
Coverage config in the setup.cfg
[coverage:run]
source =
f
branch = true
parallel = true
concurrency = multiprocessing
sigterm = true
[coverage:report]
# Coverage report configs.
show_missing = true
# skip_covered = false
fail_under = 100
precision = 2
sort = Miss
exclude_lines =
pragma: nocover
except ImportError:
except NameError:
if __name__ == .__main__.:
[coverage:paths]
source =.
File f.py
def f():
with open("hello", "w") as f:
f.write("world")
os.getenv("HELLO")
with open("hello") as f:
thing = f.read()
return thing
def g():
if 1 == 1:
return f()
def what():
if 1 == 0:
return g()
else:
return f()
test_f.py
import f
def test_f():
assert f.f() == "world"
def test_g():
assert f.g() == "world"
# and then 10,000 of the same thing
def test():
assert 1 == 1
def another():
assert 2 == 2
Attempting to triage this ticket at the pycon 2023 sprints
Working with @paxnovem and @marcgibbons at PyCon 2023 and we were finally able to get this down to a consistently reproducible test case! 🎉
example.py:
from contextlib import nullcontext
def foo():
with nullcontext():
pass
foo()
shell
python3.11 coverage erase
python3.8 coverage run example.py
python3.11 coverage report
Output:
Name Stmts Miss Branch BrPart Cover
----------------------------------------------
example.py 5 0 2 1 86%
----------------------------------------------
TOTAL 5 0 2 1 86%
When coverage is run and the report is generated on the same version (Python 3.8 + Python 3.8 or Python 3.11 + Python 3.11), the issue does not occur and the coverage comes back as 100%. The issue appears to only occur using a version before Python 3.9 to run the tracer and a version of 3.10 and after the run the report.
Tracing through code and how this all performs, we were able to determine the cause of this issue. When running coverage, the tracer only stores the captures lines and branches (arcs) that were executed during the run. When reporting on coverage, the reporter takes the captured traced lines and branches (arcs) that were executed and compares it to the parsed AST to determine what lines and branches existed to be executed. The bug currently occurs because of a difference in the number of branches that are detected from the AST in Python 3.11, where there are 2 additional branches added for with
statements in Python 3.11 vs Python 3.8. You can verify this in the reporting output between the two versions.
example.py
from contextlib import nullcontext
import sys
def foo():
if sys.version_info < (3, 10):
with nullcontext():
pass
foo()
Shell:
$ python3.8 -m coverage run example.py
$ python3.8 -m coverage report
Name Stmts Miss Branch BrPart Cover
----------------------------------------------
example.py 7 0 2 1 89%
----------------------------------------------
TOTAL 7 0 2 1 89%
$ python3.11 -m coverage run example.py
$ python3.11 -m coverage report
Name Stmts Miss Branch BrPart Cover
----------------------------------------------
example.py 7 2 4 1 55%
----------------------------------------------
TOTAL 7 2 4 1 55%
$ python3.11 -m coverage combine .coverage-py311 .coverage-py38
Combined data file .coverage-py38
$ python3.8 -m coverage report
Name Stmts Miss Branch BrPart Cover
----------------------------------------------
example.py 7 0 2 0 100%
----------------------------------------------
TOTAL 7 0 2 0 100%
$ python3.11 -m coverage report
Name Stmts Miss Branch BrPart Cover
----------------------------------------------
example.py 7 0 4 1 91%
----------------------------------------------
TOTAL 7 0 4 1 91%
Looking at the "Branch" column you can see the 2 new branches in Python 3.11 that are not present in Python 3.8. Additionally, when the reports are combined together you can see the 1 missing branch for Python 3.11 (the second one for the with
) but you also see that Python 3.8 treats it as a fully covered file.
My recommendation would be to always run the coverage report on the lowest version instead of on the highest version in order to avoid missing phantom branches.