Python thread local variables are reset in callbacks from a rust thread
Bug Description
If a python function is called from a thread that was spawned by rust, python thread local variables are not persistent across calls.
Steps to Reproduce
The following module defines a python extension function that spawns a thread and repeatetly calls a python callback:
use pyo3::prelude::*;
use std::thread;
#[pyfunction]
fn call_callback(py: Python, callback: Py<PyAny>) -> PyResult<()> {
py.detach(move || {
let handle = thread::spawn(move || {
for _ in 0..10 {
Python::attach(|py| callback.call0(py)).unwrap();
}
});
handle.join().unwrap();
});
Ok(())
}
#[pymodule]
fn pyo3_bug(m: &Bound<'_, PyModule>) -> PyResult<()> {
m.add_function(wrap_pyfunction!(call_callback, m)?)?;
Ok(())
}
In a corresponding python callback function we modify a thread-local variable:
import os
import threading
import pyo3_bug
tls = threading.local()
def func():
thread_name = threading.current_thread().name
thread_id = threading.get_ident()
print(f"Called from PID {os.getpid()}, thread {thread_name} with ident {thread_id}")
print(f"Value of thread local storage: {getattr(tls, 'seen', False)}")
tls.seen = True
print("=== Calling from rust")
pyo3_bug.call_callback(func)
def call_from_py():
for _ in range(10):
func()
print("=== Calling from python")
thread = threading.Thread(target=call_from_py)
thread.start()
The output is
=== Calling from rust
Called from PID 649992, thread Dummy-1 with ident 140075138545344
Value of thread local storage: False
Called from PID 649992, thread Dummy-1 with ident 140075138545344
Value of thread local storage: False
Called from PID 649992, thread Dummy-1 with ident 140075138545344
Value of thread local storage: False
...
=== Calling from python
Called from PID 649992, thread Thread-2 (call_from_py) with ident 140075136444096
Value of thread local storage: False
Called from PID 649992, thread Thread-2 (call_from_py) with ident 140075136444096
Value of thread local storage: True
Called from PID 649992, thread Thread-2 (call_from_py) with ident 140075136444096
Value of thread local storage: True
...
If the callback is called from python, the thread-local variable correctly retains its state. If called from rust, the thread-local variables is reset every time.
Your operating system and version
arch linux
Your Python version (python --version)
3.12.9
Your Rust version (rustc --version)
1.90.0
Your PyO3 version
0.26.0
How did you install python? Did you use a virtualenv?
virtualenv with uv
Additional Info
No response
I have the feeling this may be due to acquiring the GIL multiple times during Python::attach
I think you are right here. This version with the attach moved round the loop works as intended.
#[pyfunction]
fn call_callback(py: Python, callback: Py<PyAny>) -> PyResult<()> {
py.detach(move || {
let handle = std::thread::spawn(move || {
Python::attach(|py| {
for _ in 0..5 {
callback.call0(py).unwrap();
}
});
});
handle.join().unwrap();
});
Ok(())
}
I think Python considers the thread "dead" after PyGILState_Release is called (which happens when the attach closure ends) and GCs its tls state, at least that is what I think after reading https://github.com/python/cpython/issues/130394#issuecomment-2675806926
Ah, I think I get it.
I need an outer Python::attach that keeps the python ThreadState alive.
Maybe it might be possible to add a register_thread function in pyo3 that stores a reference to a python ThreadState in a thread local variable, so that the ThreadState stays alive for the duration of the lifetime of the rust thread? No idea if the rules for dropping thread local variables allows that.
Hello! I'm not intimately familiar with this project so I'm not sure whether this is relevant, but the problem can be reproduced using both no-GIL and GIL Python.
I think there might be another possible factor going on here (not tested).
If Python is creating the thread states with PyThreadState_New and PyThreadState_Swap, without using the PyGILState APIs, then I think it might be the case that PyO3 detaches that thread state and creates a new separate thread state implicitly when calling PyGILState_Ensure.
If that's the case, we probably need to figure out a way to wire up the same thread state inside the py.detach calls.
However this is also related to #3646 which would propose that ALL TLS is reset when running inside a py.detach() call. I can see that that would be unhelpful, however (and is the main reason why that PR got stuck) 🤔
I wonder if the new threads_inherit_context option has any bearing here: https://docs.python.org/3/using/cmdline.html#envvar-PYTHON_THREAD_INHERIT_CONTEXT. Maybe storing state in context variables would help? The option is on by default on the free-threaded build but needs to be opted into on the GIL-enabled build. Presumably in the future it will become the default because IMO it makes context managers and asyncio behave in a way that makes syntactic sense in Python.