Add support for free-threading builds of CPython
- #231
In addition to the failure you left a comment for, I also see a different failure which is more obviously a thread safety issue in hypothesis itself:
self = <tests.test_decompressor_fuzzing.TestDecompressor_stream_reader_fuzzing testMethod=test_buffer_source_constant_read_size>
@hypothesis.settings(
> suppress_health_check=[hypothesis.HealthCheck.large_base_example]
)
tests/test_decompressor_fuzzing.py:145:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
state = <hypothesis.core.StateForActualGivenExecution object at 0x4b3c0220190>
wrapped_test = <function accept.<locals>.test_buffer_source_constant_read_size at 0x4b3b4dc8fc0>, arguments = ()
kwargs = {'self': <tests.test_decompressor_fuzzing.TestDecompressor_stream_reader_fuzzing testMethod=test_buffer_source_constant_read_size>}
original_sig = <Signature (self, original, level, streaming, source_read_size, read_size)>
def execute_explicit_examples(state, wrapped_test, arguments, kwargs, original_sig):
assert isinstance(state, StateForActualGivenExecution)
posargs = [
p.name
for p in original_sig.parameters.values()
> if p.kind is p.POSITIONAL_OR_KEYWORD
]
E RecursionError: maximum recursion depth exceeded
../../.pyenv/versions/3.13.0t/lib/python3.13t/site-packages/hypothesis/core.py:433: RecursionError
Hypothesis even warns about this:
tests/test_decompressor_fuzzing.py: 17 warnings
/Users/goldbaum/.pyenv/versions/3.13.0t/lib/python3.13t/site-packages/hypothesis/core.py:822: HypothesisWarning: The recursion limit will not be reset, since it was changed from another thread or during execution of a test.
with ensure_free_stackframes():
tests/test_decompressor_fuzzing.py: 17 warnings
/Users/goldbaum/.pyenv/versions/3.13.0t/lib/python3.13t/site-packages/hypothesis/internal/conjecture/engine.py:316: HypothesisWarning: The recursion limit will not be reset, since it was changed from another thread or during execution of a test.
with ensure_free_stackframes():
tests/test_decompressor_fuzzing.py: 10 warnings
/Users/goldbaum/.pyenv/versions/3.13.0t/lib/python3.13t/site-packages/hypothesis/core.py:643: HypothesisWarning: The recursion limit will not be reset, since it was changed from another thread or during execution of a test.
with ensure_free_stackframes():
So I suspect the other failure is caused by a similar problem happening.
And indeed looking at the hypothesis docs, they do not support running hypothesis simultaneously in multiple threads:
https://hypothesis.readthedocs.io/en/latest/details.html#thread-safety-policy
I'll open an issue to document this in the free-threading porting guide and an issue on pytest-run-parallel to hopefully detect this and warn about it.
I think the fix for the python-zstandard tests is to mark any tests using hypothesis as thread-unsafe.
It's also probably worth adding some tests for sharing a compressor or decompressor between threads. It looks like the GIL does get released when calling into the C zstd library, so any C-level thread safety issues that exist from sharing zstd contexts between threads are probably also present in the GIL-enabled build and no one has reported them yet.
mm, it's weird that I'm not seeing that.
how are you running the tests?
I'm doing act -W .github/workflows/test.yml
I'm running it all locally on my mac dev machine. I installed the library with pip install -e . then ran ZSTD_SLOW_TESTS=1 pytest --numprocesses=1 --parallel-threads=10 -v tests/test_decompressor_fuzzing.py after installing the CI requirements.
I've added thread_unsafe annotations to all tests that use hypothesis.
I'm seeing a lot of warnings with the annotations, am I using it wrong?
PytestUnknownMarkWarning: Unknown pytest.mark.thread_unsafe - is this a typo?
I've also added a test for a compressor object that is shared between several threads, and it does cause a segmentation fault both in the free-threading and in the default build.
this mode of passing data to de/compression contexts for python-zstd does not seem to make sense anyways. this library does support multithreaded compression, but it spawns its own pool of threads internally. there's probably no reason for a Python program to feed data into a de/compressor from multiple threads.
You can make it thread-safe if you do something like this to add a per-decompressor lock: https://py-free-threading.github.io/porting/#dealing-with-thread-unsafe-libraries. Of course that won't scale well but as you said it's a weird thing to do.
Does the test segfault with the GIL too? I wouldn't be surprised if it does.
If it does, the fact that no one has reported this issue means it's not a big problem in practice and maybe you can just document the thread safety caveats?
I have a suspicion that python threading will spike in popularity soon as people adopt free-threading, so it was probably inevitable that someone would hit this eventually and in that case it probably is worth adding the locking.
Oh you said it's a bug in the default build I missed that. We've generally been trying to fix pre-existing thread safety issues that can be triggered in the default build if we can but we don't see them as blockers for shipping wheels.
Maybe @andfoy or @lysnikolaou know what's up with the warning.
so any C-level thread safety issues that exist from sharing zstd contexts between threads are probably also present in the GIL-enabled build and no one has reported them yet.
The docs in api_usage.rst state our policy around API thread safety.
We need to ensure we don't crash if someone violates the "concurrent operations on multiple threads" rule. I'm fine with undefined behavior if someone attempts operations on the same zstd context from multiple threads. But I would prefer detecting and raising an error if this can be implemented with minimal runtime cost.
You could have an atomic flag that a thread sets when it acquires the context, if another thread tries to acquire a context with the flag set then that would be an error.
It's a little obnoxious to write cross-platform C code that uses atomics (see npy_atomic.h in NumPy for my basic attempt based on the private CPython atomics headers) but in rust or C++11 or newer it's pretty straightforward.
one fairly easy cross-platform option is to just use pymutex behind some #ifdef Py_VERSION..., which would only affect some builds (noticeably the free-threading ones).
essentially, for Python >= 3.13 we could guarantee to throw an exception around concurrent use, and for prior versions we could retain the current behavior. as pointed out by Gregory the documentation is pretty clear on this:
ZstdCompressor and ZstdDecompressor instances have no guarantees about thread safety. Do not operate on the same ZstdCompressor and ZstdDecompressor instance simultaneously from different threads. It is fine to have different threads call into a single instance, just not at the same time.
not sure if I should do it in this PR or open a new one.
You can also use PyThread_type_lock on older python versions. It's sadly undocumented in CPython but you can take a look at what I did to make NumPy's use of lapack_lite thread safe to see how to conditionally use either depending on Python version: https://github.com/numpy/numpy/blob/main/numpy/linalg/umath_linalg.cpp. Grep for HAVE_EXTERNAL_LAPACK and LOCK_LAPACK_LITE.
The main difference wrt PyMutex is it's slower, supports a try lock API (which you probably don't need) and it requires a heap allocation.
@ngoldbaum I've ended up opting for an atomic flag, partly using your npy_atomic.h. Thank you for sharing it! π
@indygreg the modifications I just pushed make it so that when a ZstdCompressor is concurrently used, only one thread will succeed in using it.
All other threads get an ZstdError instead with a message pointing to the wiki.
Is this desirable? Should I apply this rule throughout the library?
I believe the performance impact is as little as it can possibly be.
The additional overhead is just one relaxed read to check if there's a thread using the ZstdCompressor object and one atomic write to mark it as used, if it isn't already.
@indygreg we'd appreciate it if you could give us your opinion on this approach and/or some code review.
It would be really helpful to be able to have cp313t wheels available. One particularly disruptive impact of python-zstandard failing to build on the free-threaded Python at the moment, is that pip install hatch does not work, so any project using hatch/hatchling can't easily support free-threaded Python. Even without wheels, a release with this PR applied would help enormously because the python-zstandard build would succeed on CI machines with developer tools installed.
There are probably things that could happen inside hatch to avoid this issue, but I think the "right" fix is for python-zstandard to help out a bit.
Hi Gregory π
Thanks for coming back to us!
Tomorrow I'll take a look at bumping numbers π
Please split out the @pytest.mark.thread_unsafe annotations to their own PR along with adding the initial FT coverage to CI. I want to see the main branch running FT builds successfully, even if like 90% of tests are skipped.
Do you mean you would like the annotations merged beforehand?
If we were to check the CI against a free-threading interpreter we would trigger the mechanism that turns the GIL back on at runtime, since the module doesn't set the Py_MOD_GIL_NOT_USED flag.
We would be effectively testing with the GIL twice. Is this what you intended?
[...] when a ZstdCompressor is concurrently used, only one thread will succeed in using it. All other threads get an ZstdError instead with a message pointing to the wiki.
Do you have any thoughts on this?
Actually, looking at this PR more and the prior discussion, I'm a bit confused what the purpose of the atomics code actually is.
Our existing docs say that ZstdCompressor and ZstdDecompressor instances aren't thread safe. It was already possible for customers to footgun themselves on GIL builds by calling into multiple methods simultaneously since the GIL would be released when calling into libzstd C code.
The atomics - if implemented globally (which this PR doesn't yet do) - could be a nice quality-of-life guard to catch code patterns that violate our thread safety guarantees. Is that the only problem it solves?
I initially/naively thought that free-threaded builds would require a fair amount of invasive code changes to prevent SEGFAULTs or similar crashes. But I think our existing API contract is essentially already free-threaded compatible since we're purposefully not multi-threaded safe at the ZstdCompressor and ZstdDecompressor level? If so and there is no new potential for crashes in free-threaded builds, I'm inclined to do the minimal thing possible to support free-threaded builds. We can then look into the lightweight atomic-based context guards as a quality-of-life improvement as a separate and follow-up PR.
Is that the only problem it solves?
Yes, exactly right.
I'm inclined to do the minimal thing possible to support free-threaded builds. We can then look into the lightweight atomic-based context guards as a quality-of-life improvement as a separate and follow-up PR.
Thatβs also the approach we took with NumPy - we didnβt initially put effort into fixing preexisting thread safety issues on the GIL-enabled build and focused instead on free-threading specific issues to get an initial release out.
I'll remove the checks and the new test π
I have reverted the changes regarding the runtime context-sharing checks and have merged main, but I won't have time to go through the version bumps today.
Re-running pip-compile did not change the requirements file.
@indygreg I didn't quite get what you want to do with the tests. I think they should get checked-in like this, but up to you.
If we added the CI for free-threading without the rest of the changes, we wouldn't be actually testing much in addition to the tests against the default build of 3.13.
But if you prefer I can split the @pytest.mark.thread_unsafe annotations and the CI step.
I think the big problem we have now is deciding what to do about this:
[build-system]
requires = [
# "cffi>=1.17.0", # ok for default, nok for free-threading
"setuptools",
]
Removing the line does not cause any installation issues for me locally. I suppose some users are expecting to have the cffi bindings installed, right? If it is necessary to have that build dependency, then we're probably blocked until cffi releases a free-threading compatible version.
@dpdani let me know if you want a hand
Yesterday I made the change that I just pushed, but didn't have the time to go further than that. I probably won't have much time to work on this until next week. Feel free to chip in Nathan! π
@dpdani I disabled the cffi backend in setup.py and removed all the pytest-run-parallel in the feature/3.13t branch on my fork. You can either pull from me and push here or give me push access to your fork, whichever you prefer.
See https://github.com/ngoldbaum/python-zstandard/tree/feature/3.13t.
Also all the tests pass with no skips if I set ZSTD_SLOW_TESTS=1.
@ngoldbaum invitation sent π
I opened https://github.com/indygreg/python-zstandard/pull/257 to update to pyo3 0.22, which if merged would allow this PR to only turn off the rust backend for the free-threaded build.
Thanks @dpdani! I also tried running github actions on my fork and found some issues with the setup-uv use. If you checkout the repo after setting up the uv venv, it blows away the venv!
I think things should be working now whenever @indygreg approves the CI run on the PR.
@indygreg can you trigger the CI here? π
Thanks!
It seems there's a problem with the modifications in setup.py when building for PyPy 3.9 on windows @ngoldbaum
Actually I don't think it's related to this PR