cpython icon indicating copy to clipboard operation
cpython copied to clipboard

TypeVar with bad arguments segfault/misbehavior

Open bast0006 opened this issue 1 year ago • 7 comments

Crash report

What happened?

import typing
typing.TypeVar(name="X", bound=type)

Triggers a segfault in python 3.12.3 on all platforms.

Does not reproduce on 3.11 or below.

This bug is similar to #110787, but that fix was never backported to 3.12, and additionally has a flaw that causes some arguments to be shifted in the previously-crashing cases.

When _PyArg_UnpackKeywordsWithVararg gets an input with insufficient positional parameters (which have been provided as keyword arguments) the varargs slot is positioned at the end of the mandatory positional parameters slots. But in the test case, the varargs slot is being overwritten by the bound slot because the keyword argument copy loop that begins here isn't aware of varargs.

If the minimal positionals are provided in the positionals tuple, https://github.com/sobolevn/cpython/blob/c4ca210f12a18edbe30b91aeb6e1915d74936caf/Python/getargs.c#L2525 line in 3.12 (missing the !=) is always true, and the keyword arguments are offset by 1, pushing them to the end of the array and leaving the varargs slot alone. But if there aren't and they need to be backfilled from the keyword arguments, nargs doesn't change in the loop, causing it to overwrite the varargs slot and additionally fail to completely fill the array (causing a segfault when the parent function tries to use that last garbage slot).

This can be fixed by changing the nargs to i so the line reads if (i < vararg) {, then keyword arguments that look up before the varargs entry are not offset, and those that look up after are offset, leaving that slot untouched and ensuring the array is properly filled.

Because i always begins at the end of where the provided positional arguments start, this will hopefully never accidentally overwrite positional arguments, and should solve the problem entirely.

The current fix with != is insufficient because if you provide a third parameter, the != becomes true again, and it reuses a slot. Thanks to a null check it doesn't segfault, but it does result in unexpected behavior:

import typing
T = typing.TypeVar(name="T", bound=type, covariant=True)
assert T.__covariant__

fails with an AssertionError in the 3.13 tag and main.

(TypeVar("T", bound=type, covariant=True).__covariant__ is true, however)

CPython versions tested on:

3.11, 3.12, CPython main branch

Operating systems tested on:

macOS, Windows

Output from running 'python -VV' on the command line:

Python 3.14.0a0 (heads/main-dirty:cb6f75a32ca, May 8 2024, 20:35:11) [Clang 15.0.0 (clang-1500.3.9.4)]

bast0006 avatar May 09 '24 03:05 bast0006

I now realize I'm linking to a 3.12 PR that was merged when I said "wasn't backported", sorry. Part of the fix was backported, but the getargs.c change wasn't. The behavior is additionally a bit messy. My 3.12 install from pyenv accepts the test case with a TypeError: constraints must be a tuple that implies it's not hitting the array-fill segfault (but is hitting bad slot-filling behavior).

From my pyenv non-debug install:

>>> typing.TypeVar(name="T", bound=type, covariant=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: constraints must be a tuple
>>> typing.TypeVar(name="T", bound=(type,), covariant=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: A single constraint is not allowed
>>> typing.TypeVar(name="T", bound=(type, int), covariant=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Constraints cannot be combined with bound=...
>>> typing.TypeVar(name="T", bound=(type, int))
T
>>> typing.TypeVar(name="T", bound=(type, int)).__bound__
>>> typing.TypeVar(name="T", bound=(type, int)).__constraints__
((...), <class 'int'>)
>>> typing.TypeVar(name="T", bound=tuple(), covariant=True)
zsh: segmentation fault  python

The debug compile segfaults when given the two-item testcase, but accepts a simple TypeVar(name="T") correctly.

And after restarting python it isn't consistent in crashing, but it is consistent in behavior, making me think I hit on the right mechanism of action:

typing.TypeVar(name="T", bound=tuple(), covariant="OhNo").__bound__
ForwardRef('OhNo')

bast0006 avatar May 09 '24 04:05 bast0006

The crash is in Argument Clinic code, cc @erlend-aasland.

>>> from typing import *
>>> TypeVar(name="x", bound=type)
Process 22706 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x5c)
    frame #0: 0x00000001000dbc6c python.exe`PyObject_IsTrue(v=0x000000016fdfe6b0) at object.c:1813:40 [opt]
   1810	    if (v == Py_None)
   1811	        return 0;
   1812	    else if (Py_TYPE(v)->tp_as_number != NULL &&
-> 1813	             Py_TYPE(v)->tp_as_number->nb_bool != NULL)
   1814	        res = (*Py_TYPE(v)->tp_as_number->nb_bool)(v);
   1815	    else if (Py_TYPE(v)->tp_as_mapping != NULL &&
   1816	             Py_TYPE(v)->tp_as_mapping->mp_length != NULL)
Target 0: (python.exe) stopped.
warning: python.exe was compiled with optimization - stepping may behave oddly; variables may not be available.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x5c)
  * frame #0: 0x00000001000dbc6c python.exe`PyObject_IsTrue(v=0x000000016fdfe6b0) at object.c:1813:40 [opt]
    frame #1: 0x0000000100117524 python.exe`typevar_new(type=<unavailable>, args=<unavailable>, kwargs=<unavailable>) at typevarobject.c.h:107:22 [opt]
    frame #2: 0x0000000100104dd0 python.exe`type_call(self=0x0000000101813c30, args=0x000000010048c7b0, kwds=0x00000001028d7d70) at typeobject.c:1897:11 [opt]

JelleZijlstra avatar May 09 '24 04:05 JelleZijlstra

@JelleZijlstra that crash is the result of passing 0x1 into PyObject_IsTrue, which is what happens when typevar_new gets the incorrectly parsed argument array back from _PyArg_UnpackKeywordsWithVararg.

 $ ./python.exe -VV
Python 3.12.3+ (heads/3.12:530c3bb271d, May  8 2024, 21:35:17) [Clang 15.0.0 (clang-1500.3.9.4)]
(gdb) up
#1  0x0000000100129b72 in typevar_new (type=<optimized out>, args=<optimized out>, kwargs=<optimized out>) at Objects/clinic/typevarobject.c.h:100
100	    infer_variance = PyObject_IsTrue(fastargs[5]);
(gdb) info locals
_kwtuple = {_this_is_not_used = {_gc_next = 0, _gc_prev = 0}, ob_base = {ob_base = {{ob_refcnt = 4294967295, ob_refcnt_split = {4294967295, 0}}, ob_type = 0x100520c60}, 
    ob_size = 5}, ob_item = {'name', 'bound', 'covariant', 'contravariant', 'infer_variance'}}
_keywords = {0x100376e78 "name", 0x1003a352c "bound", 0x1003a3532 "covariant", 0x1003a353c "contravariant", 0x1003a354a "infer_variance", 0x0}
_parser = {initialized = -1, format = 0x0, keywords = 0x10048fd80, fname = 0x1003a3559 "typevar", custom_msg = 0x0, pos = 0, min = 0, max = 0, 
  kwtuple = ('name', 'bound', 'covariant', 'contravariant', 'infer_variance'), next = 0x10051a1e8}
argsbuf = {'T', <type at remote 0x100521130>, 0x0, 0x0, 0x0, <unknown at remote 0x1>}
return_value = 0x0
nargs = <optimized out>
noptargs = <optimized out>
constraints = <type at remote 0x100521130>
covariant = 0
contravariant = 0
infer_variance = 0
fastargs = 0x7ff7bfefec30
bound = None
name = 'T'
(gdb) p fastargs[5]
$1 = <unknown at remote 0x1>

Note the contents of argsbuf, which is constructed by _PyArg_UnpackKeywordsWithVararg. That last argument should be either NULL or a python type, but it's neither. You can also see that name is assigned right, but bound is None and type has been assigned into constraints because of the bug.

bast0006 avatar May 09 '24 05:05 bast0006

Right, and that code is generated by Argument Clinic (note the path Objects/clinic/typevarobject.c.h).

Also, thanks for the detailed bug report!

JelleZijlstra avatar May 09 '24 13:05 JelleZijlstra

Right, and that code is generated by Argument Clinic (note the path Objects/clinic/typevarobject.c.h).

Also, thanks for the detailed bug report!

Thanks! Hopefully it's helpful.

That is true. I just mean that I traced the bad data/parsing collision out of argument clinic code and into handwritten code in getargs.c, which makes me think the bug actually isn't in argument clinic code, even if it is triggered by it.

bast0006 avatar May 09 '24 16:05 bast0006

I'm trying to reproduce this[^1] on macOS, but so far I've been unable to do so. A couple of observations:

  • AFAICS, we have no other Argument Clinic code with this signature, so this is the only case where this particular generation path happens
  • We have no Argument Clinic tests for this input / generated output

[^1]: the crash, that is

erlend-aasland avatar May 20 '24 19:05 erlend-aasland

_PyArg_UnpackKeywordsWithVararg is doing some weird stuff here. Either this is a bug in _PyArg_UnpackKeywordsWithVararg, or it is a bug in how we call it from the generated code.

erlend-aasland avatar May 20 '24 21:05 erlend-aasland

I can reproduce the issue on Python 3.14.0a0 (heads/main:88030861e21, Aug 1 2024, 12:49:44) [GCC 7.5.0]. I'll try to see if I can do something for that.

picnixz avatar Aug 01 '24 10:08 picnixz

I can't reproduce it in 3.13, is the reason that https://github.com/python/cpython/pull/118009 seems to only have been backported to 3.12?

devdanzin avatar Aug 01 '24 12:08 devdanzin

I couldn't reproduce it the first time because I used TypeVar('x', bound=type). Check that you used TypeVar(name='x', bound=type) for it to work (well in this case, to crash). But there is anyway an issue on 3.14.

picnixz avatar Aug 01 '24 13:08 picnixz

TypeVar(name='x', bound=type)

That does it, thanks! Should this be a release blocker for 3.13 then?

devdanzin avatar Aug 01 '24 13:08 devdanzin

3.13rc1 has already been released so it's not a blocker. But I don't know whether we'll merge it before releasing 3.13.0.

@Yhg1s If my fix is correct, should it land or not? and if my fix is not correct, does this count as a blocker of 3.13.0?

picnixz avatar Aug 01 '24 13:08 picnixz

We should definitely merge a fix for this issue into 3.13 even during the RC phase (segfaults are bugs). Not sure it should be marked as a release blocker though since 3.12 is also affected.

JelleZijlstra avatar Aug 01 '24 14:08 JelleZijlstra

There's defnitely something wrong with AC here but I won't have time to fix it today.

I think it has to do with the number of defaults keywords. I see different results (sometimes test failures, sometimes crashes) depending on whether there are 2 or 3 keywords. We should test the following cases:

self.assertEqual(func(1, kw1=True), (1, (), True, False))
self.assertEqual(func(1, kw2=True), (1, (), False, True))
self.assertEqual(func(1, kw1=True, kw2=True), (1, (), True, True))
self.assertEqual(func(1, kw2=True, kw1=True), (1, (), True, True))

self.assertEqual(func(1, 2, 3, 4), (1, (2, 3, 4), False, False))
self.assertEqual(func(1, 2, 3, 4, kw1=True), (1, (2, 3, 4), True, False))
self.assertEqual(func(1, 2, 3, 4, kw1=True), (1, (2, 3, 4), True, False))
self.assertEqual(func(1, 2, 3, 4, kw1=True, kw2=True), (1, (2, 3, 4), True, True))

self.assertEqual(func(a=1), (1, (), False, False))
self.assertEqual(func(a=1, kw1=True), (1, (), True, False))
self.assertEqual(func(kw1=True, a=1), (1, (), True, False))

self.assertEqual(func(a=1, kw2=True), (1, (), False, True))
self.assertEqual(func(kw2=True, a=1), (1, (), False, True))

with

/*[clinic input]
vararg_with_multiple_defaults

    a: object
    *args: object
    kw1: bool = False
    kw2: bool = False

[clinic start generated code]*/

static PyObject *
vararg_with_multiple_defaults_impl(PyObject *module, PyObject *a,
                                   PyObject *args, int kw1, int kw2)
/*[clinic end generated code: output=ae7ee8d22dfc7fbf input=534d91e23e6d360b]*/
{
    PyObject *obj_kw1 = kw1 ? Py_True : Py_False;
    PyObject *obj_kw2 = kw2 ? Py_True : Py_False;
    return pack_arguments_newref(4, a, args, obj_kw1, obj_kw2);
}

If the tests fail, then something's wrong. I couldn't test every combination but the following always crashes, whether I applied my (wrong) fix or not:

self.assertEqual(func(a=1, kw1=True, kw2=True), (1, (), True, True))
self.assertEqual(func(a=1, kw2=True, kw1=True), (1, (), True, True))
self.assertEqual(func(kw1=True, a=1, kw2=True), (1, (), True, True))
self.assertEqual(func(kw2=True, a=1, kw1=True), (1, (), True, True))
self.assertEqual(func(kw1=True, kw2=True, a=1), (1, (), True, True))
self.assertEqual(func(kw2=True, kw1=True, a=1), (1, (), True, True))

I think the logic of _PyArg_UnpackKeywordsWithVararg should be entirely reworked.

picnixz avatar Aug 01 '24 16:08 picnixz

There are several issues here.

  • There is a bug in _PyArg_UnpackKeywordsWithVararg. It is only manifested when keyword arguments are passed for keyword-or-positional parameters. So print() does not have this bug, but the TypeVar constructor does. The fix is quite simple: #122664.
  • There are several bugs in Argument Clinic (@picnixz discovered this when tried to write more extended Argument Clinic tests). This is not directly related to this issue, so I'll address this in a separate issue. I already have a solution.
  • There are questions about the design of _PyArg_UnpackKeywordsWithVararg. I'm going to re-design it in a separate issue, after fixing bugs in _PyArg_UnpackKeywordsWithVararg and Argument Clinic.

serhiy-storchaka avatar Aug 05 '24 07:08 serhiy-storchaka