coveragepy
coveragepy copied to clipboard
Race condition leading to hanging tests using coverage >=7.5.5
Describe the bug
On the waitress project we use coverage along with pytest-cov to compute coverage on all runs. Most recently we received a new contribution that fired CI across the test matrix, which included hanging in tests/test_functional.py. These tests spin up a server (with threads) using multiprocessing.
The developer who was adding new changes caught the issue and provided a stack trace when they hit Ctrl+C due to the test suite hanging:
https://github.com/Pylons/waitress/pull/446#issuecomment-2439999668
Copied in its entirety here:
platform linux -- Python 3.12.7, pytest-8.3.3, pluggy-1.5.0
rootdir: .../projects/waitress
configfile: setup.cfg
testpaths: tests
plugins: cov-5.0.0
collected 796 items
tests/test_adjustments.py ................................................. [ 6%]
tests/test_buffers.py .................................................... [ 12%]
tests/test_channel.py ......................................................................................................................... [ 27%]
tests/test_functional.py ...................................................................................^CTraceback (most recent call last):
File ".../projects/waitress/src/waitress/server.py", line 325, in run
self.asyncore.loop(
File ".../projects/waitress/src/waitress/wasyncore.py", line 245, in loop
poll_fun(timeout, map)
File ".../projects/waitress/src/waitress/wasyncore.py", line 183, in poll
read(obj)
File ".../projects/waitress/src/waitress/wasyncore.py", line 104, in read
obj.handle_read_event()
File ".../projects/waitress/src/waitress/wasyncore.py", line 466, in handle_read_event
self.handle_read()
File ".../projects/waitress/src/waitress/channel.py", line 156, in handle_read
data = self.recv(self.adj.recv_bytes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../projects/waitress/src/waitress/wasyncore.py", line 409, in recv
def recv(self, buffer_size):
File ".../projects/waitress/.venv/lib/python3.12/site-packages/coverage/collector.py", line 252, in lock_data
self.data_lock.acquire()
File ".../projects/waitress/tests/test_functional.py", line 43, in sigterm
sys.exit(0)
SystemExit: 0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File ".../projects/waitress/tests/test_functional.py", line 33, in start_server
svr(app, queue, **kwargs).run()
File ".../projects/waitress/src/waitress/server.py", line 331, in run
self.task_dispatcher.shutdown()
File ".../projects/waitress/src/waitress/task.py", line 118, in shutdown
def shutdown(self, cancel_pending=True, timeout=5):
File ".../projects/waitress/.venv/lib/python3.12/site-packages/coverage/collector.py", line 252, in lock_data
self.data_lock.acquire()
KeyboardInterrupt
This is how it looks in CI, until it times out:
I myself develop on macOS (M1 MacBook Pro) and have not been able to reproduce the issue at all locally. Turning coverage off in CI runs made the issue go away, so I did some testing:
- I started by downgrading to 7.5.4 - hung
- Downgraded to 7.4.4 and did not hang
- Then slowly worked myself back up to newest version that works which is 7.5.3.
https://github.com/Pylons/waitress/pull/454
Shows the various MR's and contains the action runs so you can view them.
To Reproduce How can we reproduce the problem? Please be specific. Don't link to a failing CI job. Answer the questions below:
- What version of Python are you using?
- Python 3.9
- Python 3.10
- Python 3.11
- Python 3.12
- Python 3.13
- What version of coverage.py shows the problem? The output of
coverage debug sysis helpful.- 7.6.5
- 7.5.5
- What versions of what packages do you have installed? The output of
pip freezeis helpful.- coverage==7.6.5
- iniconfig==2.0.0
- packaging==24.2
- pip==24.3.1
- pluggy==1.5.0
- pytest==8.3.3
- pytest-cov==6.0.0
- What code shows the problem? Give us a specific commit of a specific repo that we can check out. If you've already worked around the problem, please provide a commit before that fix.
- Issue exists on
mainon https://github.com/Pylons/waitress - Commit sha1: https://github.com/Pylons/waitress/commit/23ac524459cf9bad48faabdd0bd5be43434d4af6
- Issue exists on
- What commands should we run to reproduce the problem? Be specific. Include everything, even
git clone,pip install, and so on. Explain like we're five!- python3 -mvenv toxcmd
- ./toxcmd/bin/pip install -U tox
- git clone https://github.com/Pylons/waitress.git
- ./toxcmd/bin/tox -e py
This is a race condition, it may or may not happen. I have been unable to reproduce it outside of CI/CD. Seems to happen fairly often, rerunning jobs will usually allow them to succeed.
Expected behavior
No deadlock/hang while running the test suite with newer versions of coverage.
Additional context
This is a race condition. I'm sorry, I haven't been able to reproduce it at all locally so I can't provide anymore data or debug information.