Use `PyObject_GetTypeData` without the GIL/attached thread state
Feature or enhancement
Proposal:
In Cython we've historically let people do things like:
cdef class C:
cdef int non_object_attribute
def f(C c):
with nogil:
value = c.non_object_attribute
...
i.e. we've allowed them to get direct access into extension types without the GIL (providing they aren't manipulating Python objects).
When using PEP 697 lookup this becomes a little trickier because access to the attribute has to go through PyObject_GetTypeData. Looking at the implementation of PyObject_GetTypeData, it doesn't appear to need the GIL (except in debug mode where it sanity-checks the type).
I think I can work around this by calculating the offset of the struct myself once at startup (using PyObject_GetTypeData) and then doing all the lookups manually via the offset. It also looks like nanobind use a similar trick in places so I wouldn't be alone in this abuse of the system.
However, it'd be nice not to have to do this and if PyObject_GetTypeData could somehow be exempted from the "C API needs the GIL" rule.
I do understand that people would probably prefer not to do this, so I'm happy to look at other options.
Has this already been discussed elsewhere?
I have already discussed this feature proposal on Discourse
Links to previous discussion of this feature:
I asked about this in https://discuss.python.org/t/does-pyobject-gettypedata-need-the-gil/104637/4 - it only generated a small amount of discussion.
If I understand correctly the options are:
- Explicitly allow
PyObject_GetTypeDatato be called without the GIL held. In practice, this might mean getting rid of theassert(PyObject_TypeCheck(obj, cls)); - Have Cython work around the restriction by computing and caching the offset returned by
PyObject_GetTypeData
Neither seems ideal, but I would prefer the first option (don't require the GIL).
Both options could constrain the implementation of PyObject_GetTypeData. In my experience it's easier to maintain and evolve CPython when we are explicit about constraints and downstream dependencies do the straightforward thing. In this case, the second option is more complicated (in Cython) and the constraints are more subtle.
cc @encukou
Other options are:
- Cython re-acquires the GIL each time it looks up an attribute
- Cython just requires the GIL to access cdef class attributes when in this mode (at the cost that some existing code can't be compiled in this mode - but that's maybe kind of OK - not every feature has to work in the Limited API).
That arguably isn't a huge problem if/when free-threading is the only build (because why even bother releasing the GIL/thread-state then), but it isn't great now.
(Edit:) Or continue precalculating the offset in the short-term, and be prepared to re-acquire the GIL in future if you ever need to break the offset.
Explicitly allow
PyObject_GetTypeDatato be called without the GIL held. In practice, this might mean getting rid of theassert(PyObject_TypeCheck(obj, cls));
Let's try that. @colesbury, you probably know a quick, FT-friendly way to check if the GIL is held? We can keep the assert for those cases.
For future extensions I can see, we'll need to require that Py_TYPE(self) and its MRO does not concurrently change.
@colesbury, does free-threading have some protection around __class__ or __bases__ assignment?
Have Cython work around the restriction by computing and caching the offset returned by
PyObject_GetTypeData
Let's not do that.
There's a proposal to have PyObject_GetTypeData(o, cls) depend on Py_TYPE(o) in addition to cls, to support multiple inheritance. It would be tricky to get the cache key right with only current CPython to test on.
Or continue precalculating the offset in the short-term, and be prepared to re-acquire the GIL in future if you ever need to break the offset.
That would break existing compiled extensions -- ones that are tagged as compatible with future CPython versions.
Cython just requires the GIL to access cdef class attributes when in this mode (at the cost that some existing code can't be compiled in this mode - but that's maybe kind of OK - not every feature has to work in the Limited API).
Hmm, that sounds reasonable. How hard would it be to do this in Cython?
colesbury, you probably know a quick, FT-friendly way to check if the GIL is held?
// Only perform the type check if there's an active thread state
assert(_PyThreadState_GET() == NULL || PyObject_TypeCheck(obj, cls));
colesbury, does free-threading have some protection around
__class__or__bases__assignment?
Yes, but it's not going to be helpful here. Reassigning __class__ and __bases__ uses _PyEval_StopTheWorld(), but that only waits for threads with active thread states to pause. So a thread that doesn't have an active PyThreadState (FT) / doesn't hold the GIL isn't affected
Using Py_TPFLAGS_IMMUTABLETYPE would avoid any __bases__ shenanigans.
There's a proposal to have PyObject_GetTypeData(o, cls) depend on Py_TYPE(o) in addition to cls, to support multiple inheritance. It would be tricky to get the cache key right with only current CPython to test on.
It's also a considerably more complicated cache key because it's not just restricted to "exact types that I know about at compile time".
Or continue precalculating the offset in the short-term, and be prepared to re-acquire the GIL in future if you ever need to break the offset.
That would break existing compiled extensions -- ones that are tagged as compatible with future CPython versions.
I'm less worried about that than you - Cython already substitutes unstable C API for unstable Python API in the Stable ABI mode. So from our point of view we don't really expect our stable ABI extensions to work "forever" into the future.
Cython just requires the GIL to access cdef class attributes when in this mode (at the cost that some existing code can't be compiled in this mode - but that's maybe kind of OK - not every feature has to work in the Limited API).
Hmm, that sounds reasonable. How hard would it be to do this in Cython?
It's fairly easy, although it'd have to be a mixture of compile-time and runtime errors.
I've now realised that it would break most of our typed memoryview implementation though. That might be important enough that we'd have to do a special-case hack for that. We can block inheritance for that though.
Using Py_TPFLAGS_IMMUTABLETYPE would avoid any
__bases__shenanigans.
Historically we've made our extension type immutable (because they've been static types). They're currently mutable only when compiled in "heap types" mode. However I doubt many people are relying on it so we could turn it back on (especially for Limited API). It doesn't necessarily apply to any derived types though (thinking about any future multiple inheritance changes)?
I'll have a bit more of a think about alternative implementations on the Cython side though - maybe there's something I'm missing.
Yes, but it's not going to be helpful here. Reassigning
__class__and__bases__uses_PyEval_StopTheWorld(), but that only waits for threads with active thread states to pause. So a thread that doesn't have an activePyThreadState(FT) / doesn't hold the GIL isn't affected
Would stop-the-world with an extra mutex on top be worth thinking about?
So from our point of view we don't really expect our stable ABI extensions to work "forever" into the future.
Hmm, this is a bit off-topic, but, isn't that defeating the point of stable ABI?
So from our point of view we don't really expect our stable ABI extensions to work "forever" into the future.
Hmm, this is a bit off-topic, but, isn't that defeating the point of stable ABI?
You obviously get the benefit of having a single wheel that serves a number of current versions. And it isn't that we intentionally break future versions - we just struggle to implement everything we want in a way we can guarantee keeps working.
But yeah - slightly off-topic so I'll leave that point for now
Reassigning class and bases uses _PyEval_StopTheWorld(), but that only waits for threads with active thread states to pause. So a thread that doesn't have an active PyThreadState (FT) / doesn't hold the GIL isn't affected
The reassigning of __class__ or __bases__ is only problematic if it would change offset returned by PyObject_GetTypeData right?
IIUC the compatible_with_tp_base check in typeobject.c ensures that base can only be changed if the new base has the same tp_basicsize as the old one, so even if base is reassigned, the offset remains correct.
The old class is decref'd; couldn't it get deallocated while PyType's result is still borrowed?
But yeah, I guess an extra mutex won't help with that.
We can cache the value of cls->tp_base->tp_basicsize to a new field cls->tp_base_basic_size given that even if the __base__ of cls is changed, tp_basicsize remains same.
So code would look like:
void *
PyObject_GetTypeData(PyObject *obj, PyTypeObject *cls)
{
return (char *)obj + cls->tp_base_basic_size;
}
With that even if the old base get's deallocated while another thread without holding the gil accesses PyObject_GetTypeData, it will not cause use-after-free.
tp_base_basic_size
The caching works now - its basically what I do as a workaround (but without attaching it to the typeobject).
I think the concern is that they might want to handle more complicated cases in future like multiple inheritance. In that case the offset would depend on both obj and cls.
Obviously the layout couldn't change even if __class___ was changed, but it would still be a problem if you were traversing the type while calculating it.
It might make sense to combine this with tp_traverse-safe API; see https://discuss.python.org/t/105331/7.
My current view on how I'm going to try to deal with this (assuming nothing better comes along):
- Work on the assumption that the offset will be constant for nearly all cases.
- Keep track and set a flag if we ever create an subclass where it isn't true.
- In the likely event that the flag is 0, I can just use the offset for GIL-less lookups.
- In the unlikely event that the flag is 1, I'll fall back to a slower lookup where I reacquire the thread state around each call to
PyObject_GetTypeData.
That isn't perfect, but it means that I can optimize for the simple unchanging layout now, but if that changes in future then Cython modules will still continue to work. I think it's also enough to defer it as an immediate problem that needs solving now (i.e. for the 3.15 Stable ABI).