coveragepy icon indicating copy to clipboard operation
coveragepy copied to clipboard

Race condition leading to hanging tests using coverage >=7.5.5

Open digitalresistor opened this issue 1 year ago • 4 comments

Describe the bug

On the waitress project we use coverage along with pytest-cov to compute coverage on all runs. Most recently we received a new contribution that fired CI across the test matrix, which included hanging in tests/test_functional.py. These tests spin up a server (with threads) using multiprocessing.

The developer who was adding new changes caught the issue and provided a stack trace when they hit Ctrl+C due to the test suite hanging:

https://github.com/Pylons/waitress/pull/446#issuecomment-2439999668

Copied in its entirety here:

platform linux -- Python 3.12.7, pytest-8.3.3, pluggy-1.5.0
rootdir: .../projects/waitress
configfile: setup.cfg
testpaths: tests
plugins: cov-5.0.0
collected 796 items                                                                                                                                                                                                                                                                                                                                                                         

tests/test_adjustments.py .................................................                                                                                                                                                                                                                                                                                                           [  6%]
tests/test_buffers.py ....................................................                                                                                                                                                                                                                                                                                                            [ 12%]
tests/test_channel.py .........................................................................................................................                                                                                                                                                                                                                                       [ 27%]
tests/test_functional.py ...................................................................................^CTraceback (most recent call last):
  File ".../projects/waitress/src/waitress/server.py", line 325, in run
    self.asyncore.loop(
  File ".../projects/waitress/src/waitress/wasyncore.py", line 245, in loop
    poll_fun(timeout, map)
  File ".../projects/waitress/src/waitress/wasyncore.py", line 183, in poll
    read(obj)
  File ".../projects/waitress/src/waitress/wasyncore.py", line 104, in read
    obj.handle_read_event()
  File ".../projects/waitress/src/waitress/wasyncore.py", line 466, in handle_read_event
    self.handle_read()
  File ".../projects/waitress/src/waitress/channel.py", line 156, in handle_read
    data = self.recv(self.adj.recv_bytes)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../projects/waitress/src/waitress/wasyncore.py", line 409, in recv
    def recv(self, buffer_size):
    
  File ".../projects/waitress/.venv/lib/python3.12/site-packages/coverage/collector.py", line 252, in lock_data
    self.data_lock.acquire()
  File ".../projects/waitress/tests/test_functional.py", line 43, in sigterm
    sys.exit(0)
SystemExit: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File ".../projects/waitress/tests/test_functional.py", line 33, in start_server
    svr(app, queue, **kwargs).run()
  File ".../projects/waitress/src/waitress/server.py", line 331, in run
    self.task_dispatcher.shutdown()
  File ".../projects/waitress/src/waitress/task.py", line 118, in shutdown
    def shutdown(self, cancel_pending=True, timeout=5):
    
  File ".../projects/waitress/.venv/lib/python3.12/site-packages/coverage/collector.py", line 252, in lock_data
    self.data_lock.acquire()
KeyboardInterrupt

This is how it looks in CI, until it times out:

Screenshot 2024-11-14 at 20 20 54

I myself develop on macOS (M1 MacBook Pro) and have not been able to reproduce the issue at all locally. Turning coverage off in CI runs made the issue go away, so I did some testing:

  • I started by downgrading to 7.5.4 - hung
  • Downgraded to 7.4.4 and did not hang
  • Then slowly worked myself back up to newest version that works which is 7.5.3.

https://github.com/Pylons/waitress/pull/454

Shows the various MR's and contains the action runs so you can view them.

To Reproduce How can we reproduce the problem? Please be specific. Don't link to a failing CI job. Answer the questions below:

  1. What version of Python are you using?
    • Python 3.9
    • Python 3.10
    • Python 3.11
    • Python 3.12
    • Python 3.13
  2. What version of coverage.py shows the problem? The output of coverage debug sys is helpful.
    • 7.6.5
    • 7.5.5
  3. What versions of what packages do you have installed? The output of pip freeze is helpful.
    • coverage==7.6.5
    • iniconfig==2.0.0
    • packaging==24.2
    • pip==24.3.1
    • pluggy==1.5.0
    • pytest==8.3.3
    • pytest-cov==6.0.0
  4. What code shows the problem? Give us a specific commit of a specific repo that we can check out. If you've already worked around the problem, please provide a commit before that fix.
    • Issue exists on main on https://github.com/Pylons/waitress
    • Commit sha1: https://github.com/Pylons/waitress/commit/23ac524459cf9bad48faabdd0bd5be43434d4af6
  5. What commands should we run to reproduce the problem? Be specific. Include everything, even git clone, pip install, and so on. Explain like we're five!
    • python3 -mvenv toxcmd
    • ./toxcmd/bin/pip install -U tox
    • git clone https://github.com/Pylons/waitress.git
    • ./toxcmd/bin/tox -e py

This is a race condition, it may or may not happen. I have been unable to reproduce it outside of CI/CD. Seems to happen fairly often, rerunning jobs will usually allow them to succeed.

Expected behavior

No deadlock/hang while running the test suite with newer versions of coverage.

Additional context

This is a race condition. I'm sorry, I haven't been able to reproduce it at all locally so I can't provide anymore data or debug information.

digitalresistor avatar Nov 15 '24 03:11 digitalresistor