coveragepy icon indicating copy to clipboard operation
coveragepy copied to clipboard

Issues found running the tests with pytest-run-parallel

Open ngoldbaum opened this issue 5 months ago • 3 comments

Describe the bug

If I run the coveragepy test suite with some small modifications to run with TSAN support and under pytest-run-parallel, I see data races inside coveragepy and CPython as well as some test failures.

pytest-run-parallel is a pytest plugin that replaces the PyTest test executor with a function that creates a thread pool and spawns many instances of the same test to run simultaneously. It can detect thread safety issues due to use of global state in the implementation of python modules. In practice, it tends to most often find issues in test suites due to the way it handles fixtures by implicitly sharing module-scope fixtures between tests. However, it's still worth going through the exercise IMO because it's one of the few automated ways to exercise big chunks of a Python codebase in a multithreaded environment. See https://py-free-threading.github.io/testing/ for more guidance on multithreaded testing from the team working on ecosystem support for free-threaded Python.

The issues in CPython have been reported upstream (https://github.com/python/cpython/pull/136994), I'm reporting the races inside coveragepy here.

I made the following modification to build coveragepy with thread sanitizer when I run a py314t environment:

diff --git a/tox.ini b/tox.ini
index 81bb0481..c162d7e3 100644
--- a/tox.ini
+++ b/tox.ini
@@ -19,6 +19,7 @@ extras =
 deps =
     -r requirements/pip.pip
     -r requirements/pytest.pip
+    pytest-run-parallel
     py3{9,10,11}: -r requirements/light-threads.pip
 
 # Windows can't update the pip version with pip running, so use Python
@@ -38,6 +39,7 @@ setenv =
     PYTHONWARNDEFAULTENCODING=
     py3{10,11,12,13,14}: PYTHONWARNDEFAULTENCODING=1
     py3{10,11,12,13,14}: PYTHONWARNINGS=ignore::EncodingWarning:pip._internal.utils.subprocess
+    py314t: CFLAGS=-fsanitize=thread
     # Disable CPython's color output
     PYTHON_COLORS=0
 
@@ -51,7 +53,7 @@ commands =
     python igor.py remove_extension
 
     # Test with the PyTracer
-    python igor.py test_with_core pytrace {posargs}
+    # python igor.py test_with_core pytrace {posargs}
 
     # Build the C extension and test with the CTracer
     python setup.py --quiet build_ext --inplace

I used a CPython 3.14t interpreter compiled with thread sanitizer support on my ARM Mac using pyenv. There are docker images you can use if you don't want to handle building CPython: https://github.com/nascheme/cpython_sanity

Here's a gist with the full output from pytest: https://gist.github.com/ngoldbaum/2c487990da8482f3d4a7d5ab451229b1

The failures are here:

PARALLEL FAILED tests/test_concurrency.py::test_coverage_stop_in_threads - AssertionError: Expected current collector to be <Collector at 0x110201550: CTracer>, but it's <Collector at 0x10e1a0910: CTracer>
PARALLEL FAILED tests/test_concurrency.py::test_thread_safe_save_data - FileExistsError: [Errno 17] File exists: '/private/var/folders/nk/yds4mlh97kg9qdq745g715rw0000gn/T/pytest-of-goldbaum/pytest-1/test_thread_safe_save_...
PARALLEL FAILED tests/test_misc.py::test_stdout_link_not_tty - AssertionError: assert '\x1b]8;;some url\x07some text\x1b]8;;\x07' == 'some text'
PARALLEL FAILED tests/test_mixins.py::TempDirMixinTest::test_make_file - AssertionError: assert '' == 'Hello\nBye\n'
PARALLEL FAILED tests/test_mixins.py::TempDirMixinTest::test_make_bytes_file - AssertionError: assert b'' == b'\x993fhello\x00'
PARALLEL FAILED tests/test_mixins.py::RestoreModulessMixinTest::test_cleanup_and_reimport - KeyError: 'xyzzy'

Just looking at the names of the tests and the assertion errors, some of these are due to using global state like sys.modules or the filesystem in tests. There are also some tests that spawn and stop threads and I could see those tests running parallel doing something that is not anticipated by the current implementation.

There are also around 50 races reported from the ctracer internals. I see CTracer_stop, CTracer_trace, CTracer_set_pdata_stack, CTracer_handle_return, DataStack_grow, and CTracer_handle_call showing up in race reports.

To Reproduce How can we reproduce the problem? Please be specific. Don't link to a failing CI job. Answer the questions below:

  1. What version of Python are you using? Python 3.14.0rc1+ free-threading build (heads/3.14:b7168d2, Jul 24 2025, 10:22:50) [Clang 17.0.0 (clang-1700.0.13.5)]
  2. What version of coverage.py shows the problem? The output of coverage debug sys is helpful. https://gist.github.com/ngoldbaum/66bc9fce5181d0ba76ba7d48e167c9ad
  3. What versions of what packages do you have installed? The output of pip freeze is helpful. https://gist.github.com/ngoldbaum/f053b5d582e487402f50474d9d3d58b2
  4. What code shows the problem? Give us a specific commit of a specific repo that we can check out. If you've already worked around the problem, please provide a commit before that fix. I tested with 99a97aa1473ff7db9715a849913f83c2f7954308.
  5. What commands should we run to reproduce the problem? Be specific. Include everything, even git clone, pip install, and so on. Explain like we're five!

If you just want to see the failures and run the tests without TSAN, then with a free-threaded 3.14rc1 interpreter, do the following in a checkout of the coveragepy repo that has been modified to add pytest-run-parallel as a tox test dependency:

diff --git a/tox.ini b/tox.ini
index 81bb0481..36e0160e 100644
--- a/tox.ini
+++ b/tox.ini
@@ -19,6 +19,7 @@ extras =
 deps =
     -r requirements/pip.pip
     -r requirements/pytest.pip
+    pytest-run-parallel
     py3{9,10,11}: -r requirements/light-threads.pip

 # Windows can't update the pip version with pip running, so use Python
$ tox -e py314t -- -sv -n 0 --parallel-threads=4 --skip-thread-unsafe=true

If you're able to run the cpython-tsan:3.14t docker image from cpython_sanity, then you should be able to reproduce this with TSAN as well inside the docker container.

Expected behavior Running the coveragepy tests under pytest-run-parallel with no data race reports, or only reports for known issues in CPython.

Additional context I initially hit this trying to run the python-isal tests using pytest-run-parallel. Their tests run under coverage in their tox config, so my TSAN instrumentation happened to cover coverage itself. See https://github.com/pycompression/python-isal/pull/233#issuecomment-3114284590.

ngoldbaum avatar Jul 24 '25 18:07 ngoldbaum

I'm interested in making improvements, but am going to need help.

nedbat avatar Jul 25 '25 21:07 nedbat

I created https://github.com/nedbat/coveragepy/pull/2018 to fix data races at https://gist.github.com/kumaraditya303/1f6f3de92b0d0e503f6a5cdf8d6d7ece

Analysis:

Currently when a CTracer is stopped, the started flag is set to false and let the running thread handle removing the tracer by calling PyEval_SetTrace(NULL, NULL). While resuming CTracer.start is called for all tracers which sets the started flag and calls PyEval_SetTrace however that is not safe, as that causes multiple tracers to be installed for the current thread. This causes data races in free-threading because as soon as tracer is installed, it can start tracing and you can end up changing the tracer while it is running. To fix this, instead of calling start directly on current thread for all ctracers which is unsafe, leave it for threading.settrace or self._start_tracer() to set it.

kumaraditya303 avatar Aug 08 '25 11:08 kumaraditya303

https://github.com/nedbat/coveragepy/pull/2019 fixes the remaining data races by using atomics for started and activity fields. atomics is required because started can be set concurrently while it is being read in a different thread, using atomic fixes this.

kumaraditya303 avatar Aug 08 '25 12:08 kumaraditya303

Since gh-2018 and gh-2019 as well as the upstream CPython fixes were merged, is there anything left to do or can this be closed?

rgommers avatar Dec 15 '25 14:12 rgommers

Since https://github.com/coveragepy/coveragepy/pull/2018 and https://github.com/coveragepy/coveragepy/pull/2019 as well as the upstream CPython fixes were merged, is there anything left to do or can this be closed?

https://github.com/coveragepy/coveragepy/pull/2018 was reverted by https://github.com/coveragepy/coveragepy/pull/2027 because it broke nested tracers which was untested at that time. I wasn't able to figure out a way to fix that but the more important fix is in place https://github.com/coveragepy/coveragepy/pull/2019.

kumaraditya303 avatar Dec 16 '25 09:12 kumaraditya303