cpython icon indicating copy to clipboard operation
cpython copied to clipboard

Types with Py_TPFLAGS_MANAGED_WEAKREF but not Py_TPFLAGS_HAVE_GC crash when creating a weak reference

Open chris-se opened this issue 7 months ago • 16 comments

Crash report

What happened?

Take the following simple example C extension:

#define PY_SSIZE_T_CLEAN
#include <Python.h>

typedef struct {
    PyObject_HEAD
    /* Type-specific fields go here. */
} CustomObject;

static PyTypeObject CustomType = {
    .ob_base = PyVarObject_HEAD_INIT(NULL, 0)
    .tp_name = "weak_bug_repro.Custom",
    .tp_doc = PyDoc_STR("Custom objects"),
    .tp_basicsize = sizeof(CustomObject),
    .tp_itemsize = 0,
    .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_MANAGED_WEAKREF,
    .tp_new = PyType_GenericNew,
};

static int
weak_bug_repro_exec(PyObject* m)
{
    if (PyType_Ready(&CustomType) < 0) {
        return -1;
    }

    if (PyModule_AddObjectRef(m, "Custom", (PyObject *) &CustomType) < 0) {
        return -1;
    }

    return 0;
}

static PyModuleDef_Slot weak_bug_repro_module_slots[] = {
    {Py_mod_exec, weak_bug_repro_exec},
    {0, NULL}
};

static PyModuleDef weak_bug_repro_module = {
    .m_base = PyModuleDef_HEAD_INIT,
    .m_name = "weak_bug_repro",
    .m_size = 0,
    .m_slots = weak_bug_repro_module_slots,
};

PyMODINIT_FUNC
PyInit_weak_bug_repro(void)
{
    return PyModuleDef_Init(&weak_bug_repro_module);
}

After compiling, running the following Python code will crash:

import weak_bug_repro
import weakref

obj = weak_bug_repro.Custom()
ref = weakref.ref(obj)

Backtrace for the crash (via gdb on Linux, although it also crashes on macOS, and probably any platform):

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7a0b14b in get_basic_refs (head=0x2000000000000000, refp=refp@entry=0x7fffffffd298, proxyp=proxyp@entry=0x7fffffffd290) at Objects/weakrefobject.c:283
283         if (head != NULL && head->wr_callback == NULL) {
(gdb) bt
#0  0x00007ffff7a0b14b in get_basic_refs (head=0x2000000000000000, refp=refp@entry=0x7fffffffd298, proxyp=proxyp@entry=0x7fffffffd290) at Objects/weakrefobject.c:283
#1  0x00007ffff7a0b6b6 in try_reuse_basic_ref (list=<optimized out>, type=type@entry=0x7ffff7de9ae0 <_PyWeakref_RefType>, callback=callback@entry=0x0) at Objects/weakrefobject.c:338
#2  0x00007ffff7a0b8d2 in get_or_create_weakref (type=type@entry=0x7ffff7de9ae0 <_PyWeakref_RefType>, obj=0x7ffff6e55650, callback=0x0) at Objects/weakrefobject.c:428
#3  0x00007ffff7a0b956 in weakref___new__ (type=0x7ffff7de9ae0 <_PyWeakref_RefType>, args=<optimized out>, kwargs=<optimized out>) at Objects/weakrefobject.c:467
#4  0x00007ffff79b529d in type_call (self=0x7ffff7de9ae0 <_PyWeakref_RefType>, args=0x7ffff6e78780, kwds=0x0) at Objects/typeobject.c:2291
#5  0x00007ffff7926168 in _PyObject_MakeTpCall (tstate=tstate@entry=0x7ffff7e57200 <_PyRuntime+331232>, callable=callable@entry=0x7ffff7de9ae0 <_PyWeakref_RefType>, args=args@entry=0x7fffffffd608, nargs=<optimized out>, 
    keywords=keywords@entry=0x0) at Objects/call.c:242
#6  0x00007ffff792639b in _PyObject_VectorcallTstate (tstate=0x7ffff7e57200 <_PyRuntime+331232>, callable=callable@entry=0x7ffff7de9ae0 <_PyWeakref_RefType>, args=args@entry=0x7fffffffd608, nargsf=<optimized out>, 
    nargsf@entry=9223372036854775809, kwnames=kwnames@entry=0x0) at ./Include/internal/pycore_call.h:167
#7  0x00007ffff7926415 in PyObject_Vectorcall (callable=callable@entry=0x7ffff7de9ae0 <_PyWeakref_RefType>, args=args@entry=0x7fffffffd608, nargsf=9223372036854775809, kwnames=kwnames@entry=0x0) at Objects/call.c:327
#8  0x00007ffff7a5cd9b in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=0x7ffff7fb7020, throwflag=0) at Python/generated_cases.c.h:1619
#9  0x00007ffff7a7b128 in _PyEval_EvalFrame (tstate=tstate@entry=0x7ffff7e57200 <_PyRuntime+331232>, frame=frame@entry=0x7ffff7fb7020, throwflag=throwflag@entry=0) at ./Include/internal/pycore_ceval.h:119
#10 0x00007ffff7a7b2e8 in _PyEval_Vector (tstate=tstate@entry=0x7ffff7e57200 <_PyRuntime+331232>, func=func@entry=0x7ffff6e66750, locals=locals@entry=0x7ffff6e74050, args=args@entry=0x0, argcount=argcount@entry=0, 
    kwnames=kwnames@entry=0x0) at Python/ceval.c:1961
#11 0x00007ffff7a7b3c5 in PyEval_EvalCode (co=co@entry=0x7ffff6fca260, globals=globals@entry=0x7ffff6e74050, locals=locals@entry=0x7ffff6e74050) at Python/ceval.c:853
#12 0x00007ffff7aeda93 in run_eval_code_obj (tstate=tstate@entry=0x7ffff7e57200 <_PyRuntime+331232>, co=co@entry=0x7ffff6fca260, globals=globals@entry=0x7ffff6e74050, locals=locals@entry=0x7ffff6e74050) at Python/pythonrun.c:1365
#13 0x00007ffff7aedc47 in run_mod (mod=mod@entry=0x555555680260, filename=filename@entry=0x7ffff6eed9a0, globals=globals@entry=0x7ffff6e74050, locals=locals@entry=0x7ffff6e74050, flags=flags@entry=0x7fffffffd9d8, 
    arena=arena@entry=0x7ffff6ee4040, interactive_src=0x0, generate_new_source=0) at Python/pythonrun.c:1436
#14 0x00007ffff7aee4ab in pyrun_file (fp=fp@entry=0x55555555b650, filename=filename@entry=0x7ffff6eed9a0, start=start@entry=257, globals=globals@entry=0x7ffff6e74050, locals=locals@entry=0x7ffff6e74050, closeit=closeit@entry=1, 
    flags=0x7fffffffd9d8) at Python/pythonrun.c:1293
#15 0x00007ffff7aefcd6 in _PyRun_SimpleFileObject (fp=fp@entry=0x55555555b650, filename=filename@entry=0x7ffff6eed9a0, closeit=closeit@entry=1, flags=flags@entry=0x7fffffffd9d8) at Python/pythonrun.c:521
#16 0x00007ffff7aefec7 in _PyRun_AnyFileObject (fp=fp@entry=0x55555555b650, filename=filename@entry=0x7ffff6eed9a0, closeit=closeit@entry=1, flags=flags@entry=0x7fffffffd9d8) at Python/pythonrun.c:81
#17 0x00007ffff7b17aef in pymain_run_file_obj (program_name=program_name@entry=0x7ffff6eeda10, filename=filename@entry=0x7ffff6eed9a0, skip_source_first_line=0) at Modules/main.c:410
#18 0x00007ffff7b17bff in pymain_run_file (config=config@entry=0x7ffff7e222b8 <_PyRuntime+114328>) at Modules/main.c:429
#19 0x00007ffff7b186cf in pymain_run_python (exitcode=exitcode@entry=0x7fffffffdb3c) at Modules/main.c:694
#20 0x00007ffff7b18910 in Py_RunMain () at Modules/main.c:775
#21 0x00007ffff7b1896b in pymain_main (args=args@entry=0x7fffffffdb80) at Modules/main.c:805
#22 0x00007ffff7b189eb in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:829
#23 0x0000555555555142 in main (argc=<optimized out>, argv=<optimized out>) at ./Programs/python.c:15

I've initially seen this in Python 3.12 (where Py_TPFLAGS_MANAGED_WEAKREF was introduced), but I've just tested it against the current main branch, and it still crashes there.

The reason for the crash is the following:

  • PyType_GenericAlloc (default tp_alloc of all types) allocates the object by calling _PyType_AllocNoTrack (Objects/typeobject.c)
  • _PyType_AllocNoTrack uses _PyType_PreHeaderSize to determine how many bytes to allocate in front of the PyObject structure (Objects/typeobject.c)
  • _PyType_PreHeaderSize returns the size of 2 pointers if either Py_TPFLAGS_MANAGED_WEAKREF or Py_TPFLAGS_MANAGED_DICT is a flag plus sizeof(PyGC_Head) if the object supports GC traversal, which is also the size of 2 pointers (technically 2 times uintptr_) (Include/internal/pycore_object.h)
  • However, MANAGED_WEAKREF_OFFSET is hard-defined to be (((Py_ssize_t)sizeof(PyObject *))*-4), which is only correct if the PyGC_Head struct is also present, and that is put into tp_weaklistoffset of the object
  • And GET_WEAKREFS_LISTPTR used in Objects/weakrefobject.c for creating a weak reference then reads from memory that comes before the area that malloc() allocated from, causing an invalid memory read, causing the crash

In Python 3.12 this would always crash, in 3.13+ this would only crash if GIL is not disabled at compile time, because Include/internal/pycore_typeobject.h defines MANAGED_WEAKREF_OFFSET to be (((Py_ssize_t)sizeof(PyObject *))*-2) if Py_GIL_DISABLED is defined.

The correct behavior should be be to set the tp_weaklistoffset correctly according to whether _PyType_IS_GC is true or not - but since there are several explicit checks for tp_weaklistoffset against the constant MANAGED_WEAKREF_OFFSET, this will probably require some level of refactoring.

CPython versions tested on:

3.12

Operating systems tested on:

No response

Output from running 'python -VV' on the command line:

No response

Linked PRs

  • gh-135863

chris-se avatar May 27 '25 09:05 chris-se

I propose to change requirements for Py_TPFLAGS_MANAGED_WEAKREF and also force checking of having Py_TPFLAGS_HAVE_GC set for Py_TPFLAGS_MANAGED_WEAKREF and Py_TPFLAGS_MANAGED_DICT.

Please take a look at linked PR.

sergey-miryanov avatar Jun 23 '25 20:06 sergey-miryanov

Although Py_TPFLAGS_MANAGED_* are not limited C API, some docs have encouraged the flags since 3.12, and the tutorial has required Py_TPFLAGS_MANAGED_WEAKREF on a static type instead of setting tp_weaklistoffset. Then, MANAGED_WEAKREF_OFFSET should be more flexible to be consistent.

Py_TPFLAGS_MANAGED_DICT has little to do with this issue, as it requires a heaptype and the type is now supposed to be used with the gc protocol.

neonene avatar Jun 25 '25 08:06 neonene

@neonene Yeah, Py_TPFLAGS_MANAGED_DICT doesn't relate to this issue. But if we omit Py_TPFLAGS_HAVE_GC when use Py_TPFLAGS_MANAGED_DICT then we also can get segfault. I propose to add checking of presence of the gc flag for both cases for dict and weakref.

sergey-miryanov avatar Jun 25 '25 08:06 sergey-miryanov

In some sense it is breaking change. I understand.

sergey-miryanov avatar Jun 25 '25 08:06 sergey-miryanov

I propose to add checking of presence of the gc flag for both cases for dict and weakref.

Just to avoid a crash, I would use NotImplementedError keeping the design unchanged.

neonene avatar Jun 26 '25 04:06 neonene

@neonene I'm not sure I get you. Can you elaborate?

sergey-miryanov avatar Jun 26 '25 05:06 sergey-miryanov

Static types are not discouraged from using tp_dictoffset since the alternative is not available:

https://github.com/python/cpython/blob/033aa5cfd856e52d5b10fc765631c16b308ee4f1/Doc/c-api/typeobj.rst?plain=1#L1845-L1848

Even on heap types without the GC flag, authors should go back to the tp slot because the Py_TPFLAGS_MANAGED_DICT handling can be considered to be not implemented yet with very low priority. Raising an exception to work around a possible crash seems fine when missing Py_TPFLAGS_HAVE_GC, but requiring/recommending the flag in the message/doc will make the things unnecessarily unrecoverable. Also, there must be a strong reason the author avoids the GC. So, I would use NotImplementedError as "under implementation" without touching the coredevs' original design as much as possible, including the documentation.

The same applies more seriously to the Py_TPFLAGS_MANAGED_WEAKREF case. As for the priority of the non-GC version of MANAGED_WEAKREF_OFFSET, I just rely on coredevs.

neonene avatar Jun 26 '25 19:06 neonene

I mostly agree with you. Some clarification to be sure we speak the same. Py_TPFLAGS_MANAGED_DICT implies Py_TPFLAGS_HAVE_GC and this is documented. If we don't want to raise a TypeError exception, then we can add an assert instead.

Py_TPFLAGS_MANAGED_WEAKREF not officially implies Py_TPFLAGS_HAVE_GC at this time, but code written as it should imply gc-flag.

sergey-miryanov avatar Jun 26 '25 19:06 sergey-miryanov

Py_TPFLAGS_MANAGED_DICT implies Py_TPFLAGS_HAVE_GC and this is documented.

Yes. Keep the original remarks as-is. Py_TPFLAGS_HAVE_GC is recommended by using "should," since Py_TPFLAGS_MANAGED_DICT accepts only heaptypes as I said above. (EDIT: since the dict can make reference cycles, strictly?)

However, your PR enforces the gc flag in some parts (title, description, NEWS, code). The OP has pointed out the inconsistency (should vs. must) there. Are you aware of the problem?

If we don't want to raise a TypeError exception, then we can add an assert instead.

The PR raises SystemError, and my suggestion is replacing it with NotImplementedError. Assertion cannot avoid a crash in non-debug builds, which should be against your concern.

Py_TPFLAGS_MANAGED_WEAKREF not officially implies Py_TPFLAGS_HAVE_GC at this time, but code written as it should imply gc-flag.

No. This issue is mainly about the static types in 3rd party extension modules, where Py_TPFLAGS_MANAGED_WEAKREF is highly encouraged.

neonene avatar Jun 26 '25 23:06 neonene

Thanks for clarification!

I'm not sure I get thin difference between should and must here. https://github.com/python/cpython/blob/033aa5cfd856e52d5b10fc765631c16b308ee4f1/Doc/c-api/typeobj.rst?plain=1#L1205-L1212

I read this as if Py_TPFLAGS_HAVE_GC flag is not set then Py_TPFLAGS_MANAGED_DICT will not work properly and bad things will happen. As I see Py_TPFLAGS_MANAGED_DICT maybe used with static types too, not only with heaptypes.

I don't believe NotImplementedError suitable here (anyway I open for any changes). I read it as it is not implemented now but may be implemented in the future. I'm not sure it will be implemented in such way. But who knows.

The same I want to add to Py_TPFLAGS_MANAGED_WEAKREF or get direction from coredevs that we should fix implementation.

sergey-miryanov avatar Jun 27 '25 03:06 sergey-miryanov

From my perspective, as the one who ran into this issue and reported it:

I'm not sure I get thin difference between should and must here.

cpython/Doc/c-api/typeobj.rst

Lines 1205 to 1212 in 033aa5c .. c:macro:: Py_TPFLAGS_MANAGED_DICT

   This bit indicates that instances of the class have a `~object.__dict__` 
   attribute, and that the space for the dictionary is managed by the VM. 

   If this flag is set, :c:macro:`Py_TPFLAGS_HAVE_GC` should also be set. 

   .. versionadded:: 3.12 

I read this as if Py_TPFLAGS_HAVE_GC flag is not set then Py_TPFLAGS_MANAGED_DICT will not work properly and bad things will happen.

No, in my eyes should implies that in the vast majority of cases the Py_TPFLAGS_HAVE_GC flag will also be set if the Py_TPFLAGS_MANAGED_DICT flag is set, but there might be specific exceptions where this doesn't apply. In contrast must implies that there is no situation in which Py_TPFLAGS_MANAGED_DICT can be set without the Py_TPFLAGS_HAVE_GC flag being also set.

If you throw an error when Py_TPFLAGS_MANAGED_DICT is set but not Py_TPFLAGS_HAVE_GC, then you've implicitly created a must relation between the flags. If it's a NotImplementedError there is an indication that the must relation is due to a current (transient) technical limitation and not a design choice, so the documentation may still say should in that case.

While the Python documentation (to my knowledge) does not officially reference this, I think RFC 2119 provides a good framework for how words such as must or should are to be used in technical documentations and/or specifications.

I don't believe NotImplementedError suitable here (anyway I open for any changes). I read it as it is not implemented now but may be implemented in the future. I'm not sure it will be implemented in such way. But who knows.

I think there are three sane options here:

  • Either you throw a NotImplementedError and leave the documentation mostly as-is, so that the documentation is treated as lore, i.e. it should be possible to use the 2 flags without the Py_TPFLAGS_HAVE_GC flag, but the current implementation is not there
  • Or you throw a different error and update the documentation that it is now enforced that the Py_TPFLAGS_HAVE_GC flag has to also be set
  • Or you actually change the code to allow the flags to be set without the Py_TPFLAGS_HAVE_GC flag

From my personal perspective:

I don't think Py_TPFLAGS_MANAGED_DICT makes much sense without Py_TPFLAGS_HAVE_GC, so I think here the documentation should be upgraded to use must and a SystemError (or similar, but not NotImplementedError) should be raised when they are not used in conjunction.

On the other hand I think it can make sense to have Py_TPFLAGS_MANAGED_WEAKREF without Py_TPFLAGS_HAVE_GC (think a third-party library representing some resource handle that doesn't transitively reference other Python objects, so GC support isn't required, but taking a weak reference to a resource can make sense - this is actually how I discovered this issue), so my personal preference would be for this to simply work. But I will understand if you decide that it's just too much complexity for too little gain.

chris-se avatar Jun 27 '25 08:06 chris-se

I read this as if Py_TPFLAGS_HAVE_GC flag is not set then Py_TPFLAGS_MANAGED_DICT will not work properly and bad things will happen.

No, in my eyes should implies that in the vast majority of cases the Py_TPFLAGS_HAVE_GC flag will also be set if the Py_TPFLAGS_MANAGED_DICT flag is set, but there might be specific exceptions where this doesn't apply. In contrast must implies that there is no situation in which Py_TPFLAGS_MANAGED_DICT can be set without the Py_TPFLAGS_HAVE_GC flag being also set.

If you throw an error when Py_TPFLAGS_MANAGED_DICT is set but not Py_TPFLAGS_HAVE_GC, then you've implicitly created a must relation between the flags. If it's a NotImplementedError there is an indication that the must relation is due to a current (transient) technical limitation and not a design choice, so the documentation may still say should in that case.

While the Python documentation (to my knowledge) does not officially reference this, I think RFC 2119 provides a good framework for how words such as must or should are to be used in technical documentations and/or specifications.

Thanks for clarification! I didn't think about this in RFC terms.

I don't think Py_TPFLAGS_MANAGED_DICT makes much sense without Py_TPFLAGS_HAVE_GC, so I think here the documentation should be upgraded to use must and a SystemError (or similar, but not NotImplementedError) should be raised when they are not used in conjunction.

I agree that we should fix docs about Py_TPFLAGS_MANAGED_DICT.

On the other hand I think it can make sense to have Py_TPFLAGS_MANAGED_WEAKREF without Py_TPFLAGS_HAVE_GC (think a third-party library representing some resource handle that doesn't transitively reference other Python objects, so GC support isn't required, but taking a weak reference to a resource can make sense - this is actually how I discovered this issue), so my personal preference would be for this to simply work. But I will understand if you decide that it's just too much complexity for too little gain.

Does it make sense for you if Py_TPFLAGS_MANAGED_WEAKREF and Py_TPFLAGS_HAVE_GC must be used in conjunction? On my side I will try to fix current implementation.

sergey-miryanov avatar Jun 27 '25 10:06 sergey-miryanov

Does it make sense for you if Py_TPFLAGS_MANAGED_WEAKREF and Py_TPFLAGS_HAVE_GC must be used in conjunction? On my side I will try to fix current implementation.

As I said previously: in an ideal world I think it would be preferable if you could use Py_TPFLAGS_MANAGED_WEAKREF independently of Py_TPFLAGS_HAVE_GC, but I totally get that it's a lot of added complexity to make this work without (that also makes it harder to test all corner cases), so I'd also be fine with Py_TPFLAGS_HAVE_GC being enforced if Py_TPFLAGS_MANAGED_WEAKREF. In my current extension code I've added Py_TPFLAGS_HAVE_GC (+ the necessary other bits) anyway because of the crash, so I don't even need to change anything if you do that. (Adding Py_TPFLAGS_HAVE_GC for a type that doesn't reference any other Python objects itself is simple, I just needed to add a PyObject_GC_Untrack() to the dealloc method and a traverse implementation that does nothing but return 0.)

A compromise could be that you raise a NotImplementedError instead of a SystemError at the moment to leave the option open to supporting this in the future. But the docs should still be updated in that case to indicate that Py_TPFLAGS_HAVE_GC is not optional when Py_TPFLAGS_MANAGED_WEAKREF is used, at least in current Python versions, if you decide to go that route.

chris-se avatar Jun 27 '25 11:06 chris-se

https://github.com/python/cpython/blob/c419af9e277bea7dd78f4defefc752fe93b0b8ec/Include/internal/pycore_object.h#L939-L942

It looks to me like the offset issues can be resolved all at once. If so, raising NotImplementedError should be enough for now in both cases. Ultimately, we could make the object layout redundant on higher demand, in order to keep the offsets unchanged regardless of the gc support.

I think that if Py_TPFLAGS_HAVE_GC were necessary, coredevs would have already forbidden the omission.

E: Tweaked the permalink as Py_GIL_DISABLED has no issue. It is a breaking change to require Py_TPFLAGS_HAVE_GC there.

neonene avatar Jun 28 '25 03:06 neonene

These may be good starting points to follow the history/context:

  • https://github.com/python/steering-council/issues/133

  • https://docs.python.org/3.11/c-api/typeobj.html#c.PyTypeObject.tp_dictoffset

neonene avatar Jun 29 '25 06:06 neonene

@neonene Thanks! I already read those links, also I read through the https://github.com/python/cpython/issues/95245 and related PR.

sergey-miryanov avatar Jun 29 '25 06:06 sergey-miryanov

We face this is in real work task, so I believe we should fix this to properly work with Py_TPFLAGS_MANAGED_WEAKREF but without Py_TPFLAGS_HAVE_GC.

sergey-miryanov avatar Sep 15 '25 19:09 sergey-miryanov

On the other hand I think it can make sense to have Py_TPFLAGS_MANAGED_WEAKREF without Py_TPFLAGS_HAVE_GC (think a third-party library representing some resource handle that doesn't transitively reference other Python objects, so GC support isn't required, but taking a weak reference to a resource can make sense - this is actually how I discovered this issue), so my personal preference would be for this to simply work. But I will understand if you decide that it's just too much complexity for too little gain.

I just ran into this myself doing something similar to this. A Python wrapper over a C struct. Thank you to the people in this thread for tracking down all the causes and discussing solutions.

Regardless of whether this is intended to work or not, or intended to work or not in the future, the documentation should say that it doesn't work now.

Starbuck5 avatar Sep 20 '25 21:09 Starbuck5

If Py_TPFLAGS_MANAGED_WEAKREF is set, then Py_TPFLAGS_HAVE_GC must also be set.

If a class needs its instances to support weak references, then it can use tp_weaklistoffset. I really don't want to add the extra code on hot paths to handle this relatively obscure case.

We should raise an exception in PyType_Ready for this, as well as fixing the docs.

markshannon avatar Sep 26 '25 11:09 markshannon

@markshannon Ok, thanks! At first attempt I made literally as you said.

I tried to support Py_TPFLAGS_MANAGED_WEAKREF without Py_TPFLAGS_HAVE_GC when removing GC support from _thread.lock ad @kumaraditya303 suggested but faced this bug - https://github.com/python/cpython/issues/116946#issuecomment-3293649225.

So, IIUC, we should use tp_weaklistoffset for _thread.lock - I will try.

sergey-miryanov avatar Sep 26 '25 11:09 sergey-miryanov