Declare support for the free-threaded build and ship free-threaded wheels
I'm working on ecosystem support for free-threaded Python as part of my job at Quansight Labs. Cramjam showed up in the analysis I did today of commonly used Python libraries that depend on PyO3.
See https://py-free-threading.github.io for more information about free-threaded Python.
I'm also a PyO3 maintainer and helped lead the effort to ship free-threaded support in PyO3 0.23.
If you're interested in reviewing PRs and answering questions if I have any, I'd be happy to do most of the heavy lifting to ship free-threaded wheels.
That would be splendid. I've recently spent some time removing some of the experimental codes due to reduced capacity lately. Happy to have a PR to accomplish this, thanks in advance!
Here's what happens if I build cramjam with an updated PyO3:
error[E0277]: `*mut std::ffi::c_void` cannot be shared between threads safely
--> src/lz4.rs:206:16
|
206 | pub struct Compressor {
| ^^^^^^^^^^ `*mut std::ffi::c_void` cannot be shared between threads safely
|
= help: within `lz4::lz4::Compressor`, the trait `std::marker::Sync` is not implemented for `*mut std::ffi::c_void`
note: required because it appears within the type `LZ4FCompressionContext`
--> /Users/goldbaum/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lz4-sys-1.11.1+lz4-1.10.0/src/lib.rs:43:12
|
43 | pub struct LZ4FCompressionContext(pub *mut c_void);
| ^^^^^^^^^^^^^^^^^^^^^^
note: required because it appears within the type `libcramjam::lz4::lz4::encoder::EncoderContext`
--> /Users/goldbaum/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lz4-1.28.0/src/encoder.rs:9:8
|
9 | struct EncoderContext {
| ^^^^^^^^^^^^^^
note: required because it appears within the type `libcramjam::lz4::lz4::Encoder<std::io::Cursor<Vec<u8>>>`
--> /Users/goldbaum/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lz4-1.28.0/src/encoder.rs:29:12
|
29 | pub struct Encoder<W> {
| ^^^^^^^
note: required because it appears within the type `Option<libcramjam::lz4::lz4::Encoder<std::io::Cursor<Vec<u8>>>>`
--> /Users/goldbaum/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/option.rs:572:10
|
572 | pub enum Option<T> {
| ^^^^^^
note: required because it appears within the type `lz4::lz4::Compressor`
--> src/lz4.rs:206:16
|
206 | pub struct Compressor {
| ^^^^^^^^^^
note: required by a bound in `assert_pyclass_sync`
--> /Users/goldbaum/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.23.4/src/impl_/pyclass/assertions.rs:12:22
|
10 | pub const fn assert_pyclass_sync<T>()
| ------------------- required by a bound in this function
11 | where
12 | T: PyClassSync + Sync,
| ^^^^ required by this bound in `assert_pyclass_sync`
warning: use of deprecated associated function `pyo3::types::PyBytes::new_bound_with`: renamed to `PyBytes::new_with`
--> src/io.rs:517:29
|
517 | Some(n) => PyBytes::new_bound_with(py, n, |buf| {
| ^^^^^^^^^^^^^^
|
= note: `#[warn(deprecated)]` on by default
warning: use of deprecated associated function `pyo3::types::PyBytes::new_bound`: renamed to `PyBytes::new`
--> src/io.rs:524:25
|
524 | Ok(PyBytes::new_bound(py, buf.as_slice()))
| ^^^^^^^^^
The two warnings should be easy to fix but the error is harder. here's the most pertinent bit:
= help: within `lz4::lz4::Compressor`, the trait `std::marker::Sync` is not implemented for `*mut std::ffi::c_void`
note: required because it appears within the type `LZ4FCompressionContext`
--> /Users/goldbaum/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lz4-sys-1.11.1+lz4-1.10.0/src/lib.rs:43:12
|
43 | pub struct LZ4FCompressionContext(pub *mut c_void);
| ^^^^^^^^^^^^^^^^^^^^^^
It looks like the lz4 crate wraps a raw C pointer directly.
Maybe the Compressor class should access the compressor context via a mutex? This issue on the lz4 C library indicates that it's not safe for two threads to simultaneously work with a compression context, so I think a mutex is the right choice to get interior mutability.
I also noticed the PyBuffer struct, which has unsafe impl Sync and Send and relies on the GIL for exclusive access to the PyBuffer. I think the free-threaded build probably needs to wrap that with a Mutex, although see this blog post for why the buffer protocol is fundamentally thread-unsafe. There's not much a rust mutex can do about Python threads accessing the buffer outside of the rust runtime.
see https://github.com/milesgranger/cramjam/pull/200 for an attempt at wrapping the decompressor context with a mutex.
Uffda, it's interesting; wish I had more time these days to really dig into it. For the short term, I find it somewhat acceptable, even natural to acknowledge free-threaded Python doesn't come with the same safety guarantees and only compounded when using extensions. Therefore, from your point of view, is it acceptable in the short/medium term to give a big notice on this project that it is not thread-safe, namely from lz4 and buffer protocol perspectives?
Again, I would love to dive into this. But TBH I've struggled to cut it in the open-source world and job market, etc so I have kinda hunkered down to re-evaluate things.
Therefore, from your point of view, is it acceptable in the short/medium term to give a big notice on this project that it is not thread-safe, namely from lz4 and buffer protocol perspectives?
The LZ4 problem we can fix with the PR I opened. If you worry about the overhead of needing to acquire a mutex every time you touch a compressor handle then you could also make the library generate a runtime error if a thread tries to use a compressor or decompressor context that is owned by another thread (similar to the guarantees of the C library). You can implement a runtime error using a single atomic integer. Reading from and updating an atomic integer should have negligible overhead.
For the buffer protocol it's probably OK to just do rely on the upstream guarantees from PyO3. At some point PyO3 will probably make the PyBuffer bindings unsafe and then cramjam can deal with the fallout.
I did another pass over the library today and would like to share what I've found to get some feedback.
Thread safety analysis
Use of unsafe impl
There are the following unsafe impl Send or unsafe impl Sync implementations that need to be verified:
± rg "unsafe impl"
src/io.rs
184:unsafe impl Send for PythonBuffer {}
185:unsafe impl Sync for PythonBuffer {}
src/blosc2.rs
217: unsafe impl Send for Compressor {}
356: unsafe impl Send for PySChunk {}
For now I'm going to ignore the experimental blosc2 bindings. However, I did look at blosc2-rs and I think before shipping blosc2 support there needs to be a lot more work to ensure blosc2-rs is thread-safe. I tried going through the c-blosc2 docs a little and don't see much in terms of specifics about how to use the library from multiple threads.
For PythonBuffer - @milesgranger would you be open to refactoring the PythonBuffer type to instead use the PyBuffer<T> type defined by pyo3? Maybe PyBuffer<&dyn Any> would work? I think relying on PyO3 to give you safety here is probably better in the long run than trying to keep up with upstream changes. Another benefit is you could remove cramjam's explicit dependency on pyo3-ffi and all the raw FFI calls into the CPython C API in the library.
Mutable global state
I don't see any use of static global variables, let alone any mutable statics.
Non-thread-safe rust types
I don't see any use of RefCell, UnsafeCell, Rc, or other rust types that aren't thread-safe. I also don't see any uses of the interior mutability pattern (am I missing anything?)
Multithreaded testing approach
I've run the tests under TSAN and don't see any races reported. Of course most of the tests are single-threaded so that doesn't mean much.
Unfortunately hypothesis itself is not thread-safe and working with the existing tests that rely on hypothesis generating data will probably not work well.
Maybe I could re-use test_integration and instead of compressing and decompressing in a single thread, I try compressing and decompressing in multiple threads?
I think I also need to write explicitly multithreaded tests for the compressor objects to make sure it's safe to share them between Python threads.
Do you have any other suggestions for things to focus on for testing?
I think re-using the test_integration in an adapted threaded way would be good. Probably also worth it to do the migration to PyBuffer, I can't recall why it wasn't like that from the beginning though so maybe I had reasons at the time I'll remind myself of.
Perfectly fine to ignore blosc2 and other experimental modules; unless I have a sudden influx of time/new job I'm not likely to dive back into that area for some time.
PyBuffer, I can't recall why it wasn't like that
I don't think py03 supported it directly!
Maybe PyBuffer<&dyn Any> would work?
Hrm, I was attempting it now but it's terribly ugly. That doesn't work because it needs T: pyo3::buffer::Element and bonus that Element is not dyn compatible so can't get away with just using a lifetime. Therefore it either needs to bubble up T into BytesType enum which cannot work because that becomes a generic which to my knowledge doesn't work with exposed python functions. I'm only left with defining every possible ElementType then as wrappers to valid concrete types PyBuffer<u8/i8/u32/....>. Which is sick because T is Phantom there in PyBuffer anyhow.
But I'm still poking at it.
Maybe we should have a type-erased PyBuffer in PyO3? I'll raise an issue over there.
Super. Also Element isn't implemented for bool so even with my current iteration, the different dtypes test fails for boolean numpy arrays. Although I'd added that to a local pyo3 fork and it works from there. But if a type-erased PyBuffer can be done in PyO3 then that'd be great and wouldn't need to make a PR for the Element bool impl.
I looked at adding multithreaded tests today and hit the behavior I describe in this CPython issue comment: https://github.com/python/cpython/issues/129107#issuecomment-2686217709.
I can probably just blacklist using bytearrays on 3.13 in the tests because of the thread safety issues described in the linked issue, I don't think there's any plan to backport all the thread safety fixes going into 3.14. I'm not sure if other 3.13 standard library types might be problematic.
The good news is it looks like I can spawn threads inside the hypothesis tests and that seems to work fine, so as long as I don't try to share hypothesis state between threads I think I can re-use the existing tests. I think sharing the data generated by hypothesis is fine. I'll keep an eye out for issues though.
Next major thing I haven't tried yet is testing what happens when you share stream compressors between threads.
After chatting with @rgommers a little, I think it might be best to wait until the 3.14 beta is released to work more extensively on this.
My main concern is that types implementing the buffer protocol like bytearray in CPython are not thread-safe in 3.13 and the fixes will probably not be backported. See e.g. https://github.com/python/cpython/issues/129107. This makes writing even relatively simple multithreaded tests based on the existing test suite non-trivial on 3.13. We probably can't use 3.14 until upstream support for 3.14 in PyO3 is merged, but that won't happen until 3.14b1.
That means if we want to e.g. add CI now we'd need to go out of our way to avoid using buggy CPython constructs like bytearray. If we wanted to ship wheels, it would be with major caveats.
Given that cramjam builds on the free-threaded build of 3.13 right now and should work fine assuming no one attempts to use cramjam in a multithreaded context, I think leaving things as-is, with people needing to manually set e.g. PYTHON_GIL=0 to use cramjam without the GIL is probably the most honest thing we can do. 3.134b1 is in May so hopefully we won't need to leave cramjam in a limbo state for too long.
Please let me know if anyone disagrees with that or thinks that e.g. declaring full free-threaded support and shipping cp313t wheels is critical.
Thanks for the update @ngoldbaum, I appreciate your help here. What you describe sounds reasonable, and I'm not aware of anyone pushing for free-threaded releases yet.
Just to chime in - now that I can get a build of cramjam in freethreaded python setup I can continue with testing things over in uproot. We're still waiting for a lots of things to fall into place.
The plan looks fine to me, and it's good to know when to expect defaulted freethreaded behavior.
3.14.0b1 came out last week and PyO3 0.25 adds support, so we can start iterating on this again now. I'll send in a PR to update PyO3 soon.
@ngoldbaum are we good to close this w/ #226 and the 2.11.0 release w/ wheels for 3.14t?
Yup!
Thanks for the help and motivation to get this done! 👍