pyo3 icon indicating copy to clipboard operation
pyo3 copied to clipboard

Tracking issue for no-gil/freethreaded work

Open alex opened this issue 1 year ago • 57 comments

We didn't have a dedicated issue for this, so now there's one.

TODO:

  • [x] Add a cfg for no-gil, but only allowed behind an experimental feature
  • [x] ffi-check passing with a no-GIL build
  • [ ] Adopt new owned-reference-friendly C APIs
    • [x] PyDict_GetItemRef
    • [x] PyList_GetItemRef
    • [ ] PyDict_Next
    • [x] PyWeakref_GetRef
    • [x] PyImport_AddModuleRef
  • Identify places that assume a Python<'_> indicates only a single thread is executing:
    • [x] pyo3::sync::GILOnceCell
    • [x] PyClassBorrowChecker
    • [x] GILProtected
    • [ ] PyErrState::normalize
    • ...
  • [ ] A way for extensions to declare that the Py_mod_gil slot should be set
  • [ ] pyo3_ffi datetime bindings are not thread safe (?)

alex avatar Jun 20 '24 03:06 alex

As a tiny piece of this and to try to learn the library better, I'm working on adding wrappers for the new GetItemRef C API functions in the 3.13 stable API. These are needed to be fully safe for free-threaded python and are nice to have anyway on older versions because strong references are easier to reason about.

ngoldbaum avatar Jul 02 '24 20:07 ngoldbaum

Just to update the current state of things: pyo3 builds against the free-threaded build if you do:

UNSAFE_PYO3_BUILD_FREE_THREADED=1 cargo build

If you use pyenv, you'll also need to locally delete or modify the .python-version file.

This very quicky crashes inside of mimalloc internals, ultimately inside of Py_InitializeEx:

  * frame #0: 0x000000010135af60 libpython3.13t.dylib`chacha_block + 448
    frame #1: 0x000000010134f7e8 libpython3.13t.dylib`_mi_os_get_aligned_hint + 172
    frame #2: 0x000000010135cd68 libpython3.13t.dylib`unix_mmap_prim + 136
    frame #3: 0x0000000101355e80 libpython3.13t.dylib`_mi_prim_alloc + 220
    frame #4: 0x000000010134fb30 libpython3.13t.dylib`mi_os_prim_alloc + 68
    frame #5: 0x0000000101348560 libpython3.13t.dylib`_mi_os_alloc_aligned + 352
    frame #6: 0x0000000101349a9c libpython3.13t.dylib`mi_reserve_os_memory_ex + 80
    frame #7: 0x0000000101347ee8 libpython3.13t.dylib`_mi_arena_alloc_aligned + 392
    frame #8: 0x000000010135bd00 libpython3.13t.dylib`mi_segment_alloc + 468
    frame #9: 0x0000000101354950 libpython3.13t.dylib`mi_segments_page_alloc + 1468
    frame #10: 0x000000010135ab94 libpython3.13t.dylib`mi_page_fresh_alloc + 56
    frame #11: 0x0000000101351f5c libpython3.13t.dylib`mi_find_page + 528
    frame #12: 0x0000000101344070 libpython3.13t.dylib`_mi_malloc_generic + 208
    frame #13: 0x000000010144e3d8 libpython3.13t.dylib`gc_alloc + 284
    frame #14: 0x000000010144e268 libpython3.13t.dylib`_PyObject_GC_New + 96
    frame #15: 0x000000010132042c libpython3.13t.dylib`PyDict_New + 84
    frame #16: 0x00000001013b3d24 libpython3.13t.dylib`_PyUnicode_InitGlobalObjects + 236
    frame #17: 0x000000010147e4dc libpython3.13t.dylib`pycore_interp_init + 72
    frame #18: 0x000000010147bb88 libpython3.13t.dylib`Py_InitializeFromConfig + 1360
    frame #19: 0x000000010147bc9c libpython3.13t.dylib`Py_InitializeEx + 144
    frame #20: 0x000000010006e5f0 pyo3-1eb544a7db3e1a47`pyo3::gil::prepare_freethreaded_python::_$u7b$$u7b$closure$u7d$$u7d$::h922d8fd5db1fd90c((null)={closure_env#0} @ 0x00000001710665c7, (null)=0x0000000171066640) at gil.rs:69:13
    frame #21: 0x0000000100047a4c pyo3-1eb544a7db3e1a47`std::sync::once::Once::call_once_force::_$u7b$$u7b$closure$u7d$$u7d$::h15baf6dd1f7316ea(p=0x0000000171066640) at once.rs:208:40
    frame #22: 0x000000010031a770 pyo3-1eb544a7db3e1a47`std::sys::sync::once::queue::Once::call::heacc08786c6d7dfa at queue.rs:183:21 [opt]
    frame #23: 0x00000001000478b4 pyo3-1eb544a7db3e1a47`std::sync::once::Once::call_once_force::h7b8eb88c3a02f292(self=0x00000001004dce70, f={closure_env#0} @ 0x000000017106671f) at once.rs:208:9
    frame #24: 0x00000001001d68b4 pyo3-1eb544a7db3e1a47`pyo3::gil::prepare_freethreaded_python::h316cd04b406e24c0 at gil.rs:66:5
    frame #25: 0x00000001001d6924 pyo3-1eb544a7db3e1a47`pyo3::gil::GILGuard::acquire::h2127069d9988a593 at gil.rs:174:21
    frame #26: 0x0000000100053000 pyo3-1eb544a7db3e1a47`pyo3::marker::Python::with_gil::h38979cd5e69873c3(f={closure_env#0} @ 0x000000017106679f) at marker.rs:403:21
    frame #27: 0x00000001001c6548 pyo3-1eb544a7db3e1a47`pyo3::conversions::std::array::tests::test_extract_non_iterable_to_array::h3c220ef1fe379cdf at array.rs:226:9
    frame #28: 0x000000010004a1b4 pyo3-1eb544a7db3e1a47`pyo3::conversions::std::array::tests::test_extract_non_iterable_to_array::_$u7b$$u7b$closure$u7d$$u7d$::h7de4fa3687a88518((null)=0x00000001710667fe) at array.rs:225:44

Just to make sure all of this is reproducible and we have some feedback on CI, I think I'm going to add a free-threaded CI job marked with continue-on-error with a test run that crashes like this.

ngoldbaum avatar Jul 30 '24 18:07 ngoldbaum

That sounds great to me, thanks!

davidhewitt avatar Jul 30 '24 21:07 davidhewitt

I added a new checkbox for " Adopt new owned-reference-friendly C APIs". If we have a list of all the ones we need, I can make those sub-checkboxes.

alex avatar Aug 02 '24 14:08 alex

If we have a list of all the ones we need, I can make those sub-checkboxes.

I think PyDict_GetItemRef and PyList_GetItemRef are the most important ones. There'a a listing of the remaining ones in the HOWOTO guide for free-threading in the CPython docs: https://docs.python.org/3.13/howto/free-threading-extensions.html#borrowed-references

I also had a chat with @davidhewitt today and in addition to GilOnceCell, he pointed to GILProtected and PyCell as spots that make strong assumptions about the GIL.

Our first idea is to make GILProtected a no-op on Py_GIL_DISABLED builds (although we'll need to see if that has major fallout on user code) and as a first pass PyCell needs atomic increments and decrements to avoid data races in the free-threaded build.

In addition we need to use pyo3_ffi_check to update the assumptions the FFI bindings make about the free-threaded ABI. Doing this should hopefully fix some of the most egregious build issues. I am planning to work on that step next week.

I looked at adding a failing CI job, but that won't work right now because of if you run the tests on a free-threaded build with --no-fail-fast the tests will eventually deadlock. At least as far as I can see there's no option in cargo to automatically kill hung tests that run longer than a configurable timeout. You can do it manually pretty easily with a macro but I'd prefer not to do that and instead hold off on adding CI until the tests are runnable without deadlocks. Hopefully that won't be too long :)

ngoldbaum avatar Aug 02 '24 18:08 ngoldbaum

Ok, updated the tracking list.

PyCell no longer exists, should that be something else?

alex avatar Aug 02 '24 19:08 alex

I'm still learning the library and it shows...

I think David meant Bound in our discussion and he just got mixed up with the old API after a long day. I'll let him clarify.

ngoldbaum avatar Aug 02 '24 19:08 ngoldbaum

My guess is it's a reference to PyClassBorrowChecker, which manages the various borrow flags. But I'll let David say for sure.

alex avatar Aug 02 '24 19:08 alex

My mistake, yes we removed the PyCell name with the Gil refs API 👍

davidhewitt avatar Aug 03 '24 08:08 davidhewitt

See https://github.com/PyO3/pyo3/pull/4421 which updates the FFI bindings for the free-threaded build. That's enough to get the tests to pass without deadlocking, so I added a CI config as well.

ngoldbaum avatar Aug 06 '24 22:08 ngoldbaum

Added a checkbox for ffi-check being green.

alex avatar Aug 06 '24 22:08 alex

When Python is built with the fix from https://github.com/python/cpython/pull/123052 and PyO3 is built with https://github.com/PyO3/pyo3/pull/4421, I'm seeing a few PyO3 test failures in 3.14t. They don't happen when the tests are run sequentially (i.e., with --test-threads=1)

    gil::tests::test_gil_guard_update_counts
    gil::tests::test_pyobject_drop_without_gil_doesnt_decrease_refcnt
    types::weakref::proxy::tests::callable_proxy::python_class::test_weakref_proxy_behavior
    types::weakref::proxy::tests::proxy::python_class::test_weakref_proxy_behavior

test_gil_guard_update_counts and test_pyobject_drop_without_gil_doesnt_decrease_refcnt: These tests assume that no other thread can update the reference pool because the GIL is held for the duration of the test. That's not true in the free-threaded build -- other tests cases running concurrently may process and clear POOL.pending_decrefs. I'm not sure how that should be addressed.

test_weakref_proxy_behavior: These tests are overwriting the definition class A in the same scope when running concurrently. The fix is to use a local dict for the globals passed to run_bound so that the tests are isolated. Here's what I tested: https://gist.github.com/colesbury/006052f03e033087b513c81ced675f09

colesbury avatar Aug 15 '24 21:08 colesbury

So for the GIL tests, I think basically all of that should be a no-op on no-gil Python, since it's acceptable to do refcounting stuff at all times.

alex avatar Aug 16 '24 14:08 alex

For the weakref tests, seems like a perfectly fine PR.

alex avatar Aug 16 '24 14:08 alex

So for the GIL tests, I think basically all of that should be a no-op on no-gil Python, since it's acceptable to do refcounting stuff at all times.

I fear that this is not completely true, because I believe I understood from @colesbury when speaking about this in the past that we can't do refcounting when GC is running.

davidhewitt avatar Aug 16 '24 14:08 davidhewitt

Hmm, is there documentation on this? I'm not sure I understand how extension modules are supposed to be aware of this.

alex avatar Aug 16 '24 15:08 alex

Yeah, you still need to have a valid thread state when you call any Python C API (with a few exceptions like PyMem_RawMalloc, Py_Initialize, etc.). This is in the Thread State and GIL APIs section of the free-threading HOWTO.

So the pending decrefs POOL is still useful, but it's harder to make assertions about its current state because other threads may be running. Here are two ideas:

  1. Mark the test as #[serial] from the serial_test crate, but I think you'd also need to mark all the other tests as explicitly [#parallel] , which is annoying.
  2. Skip or weaken the assertions in the free-threaded build

colesbury avatar Aug 16 '24 15:08 colesbury

@davidhewitt and I chatted a bit about next steps. We decided that I'll focus next on GILProtected. The reason it exists instead of just using a rust mutex is the GIL can be reentrant, so it's possible for a mutex to deadlock with the GIL. Given that, we think that GILProtected doesn't make any sense in the free-threaded build, so we should remove it if Py_GIL_DISABLED is set. Another idea is to make GilProtected a wrapper for a mutex so supporting both free-threaded and gil-enabled builds is easier.

I'm going to making GILProtected a no-op to see what breaks inside PyO3 itself, presumably fixing it by adding new fine-grained mutexes.

David is going to look at what's necessary for GILOnceCell, which again will likely need to use its own locking.

We'll also likely need to add wrappers for the critical section API and use it for dict and list iteration. See https://github.com/PyO3/pyo3/pull/4439.

We should also look at making the tests run without setting continue_on_error using the normal nox configuration the rest of the tests use. That will mean fixing the broken tests mentioned above.

If anyone wants to help out, working on individual tests that we know need to be updated seems like a good place to start. If you do want to help, just comment in this thread so we don't accidentally work on the same thing at the same time.

ngoldbaum avatar Aug 16 '24 17:08 ngoldbaum

If there's anything that should be added to the checklist at the top, please either add it or yourself or let me know and I can add it! (Same for things that need to be checked off.)

On Fri, Aug 16, 2024 at 1:14 PM Nathan Goldbaum @.***> wrote:

@davidhewitt https://github.com/davidhewitt and I chatted a bit about next steps. We decided that I'll focus next on GILProtected. The reason it exists instead of just using a rust mutex is the GIL can be reentrant, so it's possible for a mutex to deadlock with the GIL. Given that, we think that GILProtected doesn't make any sense in the free-threaded build, so we should remove it if Py_GIL_DISABLED is set. Another idea is to make GilProtected a wrapper for a mutex so supporting both free-threaded and gil-enabled builds is easier.

I'm going to making GILProtected a no-op to see what breaks inside PyO3 itself, presumably fixing it by adding new fine-grained mutexes.

David is going to look at what's necessary for GILOnceCell, which again will likely need to use its own locking.

We'll also likely need to add wrappers for the critical section API and use it for dict and list iteration. See #4439 https://github.com/PyO3/pyo3/pull/4439.

We should also look at making the tests run without setting continue_on_error using the normal nox configuration the rest of the tests use. That will mean fixing the broken tests mentioned above.

If anyone wants to help out, working on individual tests that we know need to be updated seems like a good place to start. If you do want to help, just comment in this thread so we don't accidentally work on the same thing at the same time.

— Reply to this email directly, view it on GitHub https://github.com/PyO3/pyo3/issues/4265#issuecomment-2293867646, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAGBDNOMFPRDW5YAX4BOTZRYXNTAVCNFSM6AAAAABJTCDPMSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJTHA3DONRUGY . You are receiving this because you authored the thread.Message ID: @.***>

-- All that is necessary for evil to succeed is for good people to do nothing.

alex avatar Aug 16 '24 17:08 alex

@ngoldbaum, @davidhewitt - my hot take would be that GILProtected and GILOnceCell should use Python critical sections.

The reason it exists instead of just using a rust mutex is the GIL can be reentrant, so it's possible for a mutex to deadlock with the GIL

This is exactly the use case PyCriticalSection is designed for -- it avoids deadlocks in the same way that the GIL does. Importantly, it's more than just a reentrant lock. It's released at the same places the GIL would be released, so it won't introduce deadlocks.

There is an awkward bit, which is that we only made the PyCriticalSection_Begin variants that take a PyObject public, and for these use cases it would be a lot cleaner to use PyMutex. I don't think that's a showstopper -- you can either create a dummy PyObject or translate the _PyCriticalSection_BeginMutex function to Rust.

colesbury avatar Aug 16 '24 17:08 colesbury

I tried running the tests many times with a large number of threads, and saw some additional failures not noted above:

https://gist.github.com/ngoldbaum/8452ea9de6dca5542ac24f2aa88ad7d3

The ones failing with BorrowMutError that have lazy_type_object.rs in the traceback will be fixed once GILProtected is fixed.

For the other ones, It looks like PyO3 is somehow able to manipulate a python module object before Python is finished setting it up?

ngoldbaum avatar Aug 21 '24 20:08 ngoldbaum

In addition to the errors in the gist above, cargo stress also seems to hit "Access to the GIL is prohibited while a __traverse__ implmentation is running." panics. See this gist.

This one I'm also not sure about. Shouldn't it not be possible to acquire a thread state during a GC pass (__traverse__ is running on another thread)? I thought that this failure might be caused by GIL_COUNT transiently going negative somehow, it also happens if I change GIL_LOCKED_DURING_TRAVERSE to something very large and negative.

Also, as an aside, the naming here becomes confusing in the free-threaded build. We're probably going to eventually need to change the names of all these types and static variables that have GIL in the name to instead have names that pertain to whether the interpreter or a rust thread owns the Python thread state.

ngoldbaum avatar Aug 22 '24 19:08 ngoldbaum

I think the issue is this line in _call_traverse, but I'm not entirely confident in my analysis:

https://github.com/PyO3/pyo3/blob/2f5b45efa10e52a44bb3185dfeb716c2e815a4f8/src/impl_/pymethods.rs#L256

This creates a PyRef<T> which gets dropped (calling Py_DECREF()) implicitly some time after it's moved:

https://github.com/PyO3/pyo3/blob/2f5b45efa10e52a44bb3185dfeb716c2e815a4f8/src/impl_/pymethods.rs#L262

PyO3 works to ensure that __traverse__ implementations can't create PyRef<T> values, but the _call_traverse wrapper doesn't adhere to that.

colesbury avatar Aug 22 '24 23:08 colesbury

@ngoldbaum, @davidhewitt - my hot take would be that GILProtected and GILOnceCell should use Python critical sections.

Thanks - I will study the critical sections in more detail ASAP. I'm beginning to think that the first pass of PyO3's support for freethreading will be along the lines of "it's sound but the primitives may not be the best for UX" and then we can improve over time.

In addition to the errors in the gist above, cargo stress also seems to hit "Access to the GIL is prohibited while a __traverse__ implmentation is running." panics. See this gist.

I believe what is happening is that our _call_traverse handler is being called "directly" instead of actually inside a GC run, but how that leads to panics is interesting.

PyO3 works to ensure that __traverse__ implementations can't create PyRef<T> values, but the _call_traverse wrapper doesn't adhere to that.

You're 100% correct that there must be a bug here due to the refcounting op, but I think that the panic is a separate issue.

davidhewitt avatar Aug 23 '24 12:08 davidhewitt

I believe what is happening is that our _call_traverse handler is being called "directly" instead of actually inside a GC run, but how that leads to panics is interesting.

In Nathan's linked gist, the _call_traverse is called from the GC (frames 41-55) as expected. The panic message is because dropping the object (borrow) calls Py_DECREF, which leads to the object being deallocated. The dealloc wrapper function calls GILGuard::assume(), which panics with the seen message.

I am still fuzzy as to when and why the drop/decref of borrow leads to deallocation -- most of the time the drop/decref doesn't lead to the PyObject having zero refcount.

colesbury avatar Aug 23 '24 15:08 colesbury

Ah yes, agreed. So I suspect that the going to 0 might be related to the fact that creation of the pyref (an incref) is also UB under traverse. I'm working on a patch to clean this all up.

davidhewitt avatar Aug 23 '24 16:08 davidhewitt

For the other ones, It looks like PyO3 is somehow able to manipulate a python module object before Python is finished setting it up?

To elaborate, this is talking about test failures like this:

---- instance::tests::pystring_attr stdout ----
Error: PyErr { type: <class 'AttributeError'>, value: AttributeError("module '' has no attribute 'a'"), traceback: None }

---- types::typeobject::tests::test_type_names_nested stdout ----
thread 'types::typeobject::tests::test_type_names_nested' panicked at src/types/typeobject.rs:370:60:
called `Result::unwrap()` on an `Err` value: PyErr { type: <class 'AttributeError'>, value: AttributeError("module 'test_module' has no attribute 'OuterClass'"), traceback: None }

---- types::any::tests::test_call_method0 stdout ----
thread 'types::any::tests::test_call_method0' panicked at src/types/any.rs:1723:62:
called `Result::unwrap()` on an `Err` value: PyErr { type: <class 'AttributeError'>, value: AttributeError("module 'test_module' has no attribute 'SimpleClass'"), traceback: None }

I think this is another case of pyo3 tests using implicit global state.

As an example, test_type_names_nested_stdout uses PyModule::from_code to create a python class dynamically for the test from a string. The implementation of from_code is two FFI calls:

https://github.com/PyO3/pyo3/blob/82fa359d96732f90a210b3d6d879e17e0fe0aa2c/src/types/module.rs#L168-L173

I think the error happens when one thread is in the middle of reloading members of the module that another thread assumes is already initialized. The docs for PyImport_ExecCodeModule indicates that this function does reload modules.

I'm not sure if this indicates a soundness issue in the CPython C API, PyO3, or if this is just something that's expected to happen when you manipulate global state like this. One workaround is generating unique module names in tests that dynamically define modules.

ngoldbaum avatar Aug 28 '24 16:08 ngoldbaum

Generating unique module names seems like a suitable workaround to me. My feeling is that this probably isn't unsound in isolation (as long as other threads aren't making assumptions about modules having particular state, but that's probably unsound to assume in general).

Regarding GILOnceCell / GILProtected, I found some time to begin experimenting with them this morning and the critical section API. I'm out of time for now, the summary of my thoughts are:

Critical Sections

  • Critical sections require the begin / end pair which implies a closure-style API with_critical_section(|| { /* do_work */ }).
  • The fact that critical sections get suspended implicitly if an inner critical section is entered makes me believe we cannot use critical sections to hand out &mut references, because the suspension may violate the &mut guarantee of exclusivity (if a competing thread then acquires and mutates the locked value during the suspension).

cc @colesbury maybe you can clarify if I'm right here? I take this understanding from the docs:

You cannot use nested critical sections to lock more than one object at once, because the inner critical section may suspend the outer critical sections. This API does not provide a way to lock more than two objects at once.

... but it's not clear to me under what conditions the inner critical section may suspend outer critical sections (other than during GC).

GILProtected

  • Because of the closure-style API, GILProtected cannot work on the freethreaded build without an API break. I suggest we remove the type on the freethreaded build. Users probably need to use atomics / fine grained locks instead.
    • With the locks there is a risk of deadlocking with the GC (or even the GIL on gil-enabled builds), so we may need to introduce a safe pyo3::sync::PyMutex<T>.

GILOnceCell

  • We probably want to give this type a new name, e.g. PyOnceLock, now that std::sync::OnceLock exists. This can be done globally and deprecate the old name.
  • Critical sections could fit this use case, but I think we don't need those semantics and a mutex would be simpler.
  • Current semantics of the GIL-enabled build is that:
    • Multiple threads may race to build the initial value (requires an implicit GIL thread switch, so somewhat unlikely).
    • Only one thread will ever complete the write (guaranteed by the GIL).
    • Reentrant initialization is undefined (at the moment it would stack overflow like any other recursive algorithm).
    • Getting the value, once acquired, is cheap.
  • To mirror those semantics best, I think the solution would be:
    • use atomic reads to check if initialized, and if not, proceed to start generating the initial value
    • once the initial value is created, use a local lock to both write the value and set the atomic marker so future reads know it's set.
  • Especially if we rename the type, we might want to consider semantics a little closer to OnceLock, where other threads get blocked while one thread is initializing the value.
    • It seems to me that to avoid risks of deadlocking with the GC purposes we would need a per-GILOnceCell critical section to create these semantics. But I think whether that is feasible depends on what "the inner critical section may suspend the outer critical sections" means as per above.

davidhewitt avatar Aug 29 '24 07:08 davidhewitt

... for what it's worth, I think in the long run we could also migrate most uses of GILOnceCell away by using module state, so we might not need to worry too much about semantics here. Maybe it's better to go with GILOnceCell working as similar as possible on both builds, and introduce PyOnceLock later as a separate type with different semantics if users need it.

davidhewitt avatar Aug 29 '24 07:08 davidhewitt

The fact that critical sections get suspended implicitly if an inner critical section is entered makes me believe we cannot use critical sections to hand out &mut references, because the suspension may violate the &mut guarantee of exclusivity... Critical sections require the begin / end pair which implies a closure-style API...

I don't understand the constraints here with closure-style API vs RAII-style (like sync.Mutex).

You are right that there are some hazards when dealing with multiple critical sections. For example, I think the situation we'd want to avoid is something like:

with_critical_section(obj1, || {
  // obj1 is locked here
  with_critical_section(obj2, || {
    // obj2 is locked here, but obj1 might not be!
  });
  // obj1 is locked here
});

Note that this is a different hazard than something like the following where a drop() temporarily, internally suspends a critical section. That is equivalent to a drop() temporarily, internally releasing (and re-acquiring) the GIL, so I don't this is a problem:

with_critical_section(obj1, || {
  // obj1 is locked here
  drop(some_py_object); // might call finalizers and internally unlock & re-lock obj1
  // obj1 is locked here
});

I can think of a few possible strategies:

  • For GILOnceCell, instead of having a lock per-cell, we have one designated global object that get's locked by all critical sections.
  • We add dynamically checked rules (like RefCell) that prevent or limit nested critical sections per-thread. For example, we disallow PyO3 code from (dynamically) having more than one PyO3-managed critical section active per-thread.
  • Some magic with PyO3 lifetimes? I'm not sure what's possible here.

Also, although I think PyCriticalSection would be useful for GILOnceCell, I'm not sure PyO3 needs to expose it as a safe API, at least not initially. It might be fine to just add (unsafe) ffi bindings.

Because of the closure-style API, GILProtected cannot work on the freethreaded build without an API break. I suggest we remove the type on the freethreaded build.

GILProtected looks like it has fairly limited use, so removing it in the free-threaded build might not be too disruptive.

GILOnceCell...

Preserving the existing semantics to the extent that's possible makes sense to me. GILOnceCell looks like it is more widely used than GILProtected.

The strategy you outline for GILOnceCell makes sense to me. The important question to me is: how much does it matter if multiple threads race to build the initial value? If it doesn't matter at all, then you probably don't need to bother with critical sections. If it does matter -- because it's currently somewhat unlikely -- than critical sections might be useful as a way to ensure that the init function are mostly executed only once without introducing deadlock risk.

colesbury avatar Aug 29 '24 18:08 colesbury