cpython icon indicating copy to clipboard operation
cpython copied to clipboard

SIGSEGV with generators after direct change in gi_frame

Open efimov-mikhail opened this issue 1 year ago • 8 comments

Crash report

What happened?

There is a segmentation fault with simple code snippet:

g = (x for x in range(10))
g.gi_frame.f_locals['.0'] = range(20)
list(g)
print("No segfault")

There is a SIGSEGV on my linux machine (Debian GNU/Linux 10) with both main branch Python and 3.13 version. Message "No segfault" is printed on Python 3.7.3 (default, Oct 31 2022, 14:04:00).

CPython versions tested on:

3.13, CPython main branch

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.14.0a0 (heads/peg_parser_remove_redundant_functions:e1b62e5cf79, Oct 3 2024, 14:38:46) [GCC 8.3.0]

Linked PRs

  • gh-125051
  • gh-125178
  • gh-125846

efimov-mikhail avatar Oct 07 '24 10:10 efimov-mikhail

Backtrace on MacOS:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x0000000000000000
    frame #1: 0x00000001001d0c5c python.exe`_PyEval_EvalFrameDefault(tstate=<unavailable>, frame=0x0000000100bcf898, throwflag=0) at generated_cases.c.h:3628:36 [opt]
    frame #2: 0x00000001000a48e0 python.exe`gen_send_ex2 [inlined] _PyEval_EvalFrame(tstate=0x00000001005a8738, frame=<unavailable>, throwflag=0) at pycore_ceval.h:119:16 [opt]
    frame #3: 0x00000001000a48d4 python.exe`gen_send_ex2(gen=0x0000000100bcf850, arg=0x0000000000000000, presult=0x000000016fdfe378, exc=0, closing=<unavailable>) at genobject.c:245:24 [opt]
    frame #4: 0x00000001000a2f28 python.exe`gen_iternext(self=<unavailable>) at genobject.c:612:9 [opt]
    frame #5: 0x00000001000ba0a4 python.exe`list_extend_iter_lock_held(self=0x00000001010ca490, iterable=0x00000001010ca490) at listobject.c:1194:26 [opt]
    frame #6: 0x00000001000b6fb8 python.exe`_list_extend(self=<unavailable>, iterable=<unavailable>) at listobject.c:1367:15 [opt] [artificial]
    frame #7: 0x00000001000bd578 python.exe`list___init___impl(self=0x00000001010ca490, iterable=0x0000000100bcf850) at listobject.c:3391:13 [opt]
    frame #8: 0x00000001000b8ef0 python.exe`list_vectorcall(type=<unavailable>, args=0x00000001009dc690, nargsf=<unavailable>, kwnames=<unavailable>) at listobject.c:3415:13 [opt]
    frame #9: 0x00000001000855f8 python.exe`_PyObject_VectorcallTstate(tstate=0x00000001005a8738, callable=0x0000000100530f30, args=0x00000001009dc690, nargsf=9223372036854775809, kwnames=0x0000000000000000) at pycore_call.h:167:11 [opt]

The crash is in the FOR_ITER instruction; it blindly accesses tp_iternext without verifying that the object being iterated over has this attribute.

The repro case is a bit dubious as it's mutating f_locals, but probably we should indeed support this. The fix would be as simple as inserting a NULL check.

JelleZijlstra avatar Oct 07 '24 14:10 JelleZijlstra

Thank you for such a concrete comment! It seems like I made fix by myself after it.

efimov-mikhail avatar Oct 07 '24 15:10 efimov-mikhail

I think there are two reasonable approaches to fixing this:

  1. Strengthen the compiler/interpreter
  2. Prohibit setting non-identifier keys in the underlying frame

Given that setting an internal variable like .0 is obviously a suspect thing to do, I'd be fine with (2). However, it might break code that stores additional, possibly non-identifier, keys in the locals() dict.

markshannon avatar Oct 09 '24 13:10 markshannon

I can segfault 3.12 and earlier with a variant of this.

g = (x for x in range(10))
gen_expr_func = types.FunctionType(g.gi_code, {})
list(gen_expr_func(range(20)))
print("No segfault")

The root cause is the same: the generator expression does not check that the value passed to it is an iterator.

markshannon avatar Oct 09 '24 13:10 markshannon

So it looks like (1) is the correct fix. See https://github.com/python/cpython/pull/125178#issuecomment-2402332695 for a way to do this.

markshannon avatar Oct 09 '24 13:10 markshannon

IMHO, additional GET_ITER before FOR_ITER for genexpr is a best way to prevent all such crashes. I will try to provide code changes by myself.

efimov-mikhail avatar Oct 09 '24 13:10 efimov-mikhail

The initial fix is merged into main, though we'll want to eventually implement a different fix.

I thought it was worth investigating whether a similar crash was possible with async genexps. However, it seems that the GET_ANEXT impression is implemented more robustly and I couldn't find a crash by manipulating the locals of an asyncgen frame.

JelleZijlstra avatar Oct 22 '24 16:10 JelleZijlstra

However, it seems that the GET_ANEXT impression is implemented more robustly and I couldn't find a crash by manipulating the locals of an asyncgen frame.

Yes, it seems so. Actually, function _PyEval_GetANext from Python/ceval.c contains NULL check for type->tp_as_async->am_anext method. TypeError will be raised if no such method presents. This code looks very similar to one of the previous variants of fixing crash in my PR.

Maybe, we can remove NULL check from the GET_ANEXT bytecode implementation. And place GET_AITER instruction before GET_ANEXT, in similarity with "sync" generators. This change could possible improve perfomance a little bit. But I'm not sure that it's worth it.

On the other side, TypeErrors in GET_AITER and GET_ANEXT slightly differs in case of am_anext == NULL. Maybe we should use the same text in those functions.

cc @JelleZijlstra, @markshannon

efimov-mikhail avatar Oct 23 '24 15:10 efimov-mikhail

See #125723 for another generator f_locals related crash that may be confounding the investigation attempts here (I don't think it's a duplicate, but it's a segfault where we have no idea of the root cause, so that assessment is purely based on the reproducer having a very different structure from the ones described here)

ncoghlan avatar Nov 08 '24 10:11 ncoghlan

I do not think there was a bug. Bytecode manipulation can lead to crash -- this is expected. Addind a duplicated GET_ITER is just a pessimization. It slows down everyones code. I think this change should be reverted.

serhiy-storchaka avatar Dec 25 '24 08:12 serhiy-storchaka

IMO, this issue should be close because of https://github.com/python/cpython/pull/132351

efimov-mikhail avatar May 25 '25 04:05 efimov-mikhail

I've started related d.p.o. thread here: https://discuss.python.org/t/generator-expressions-behavior-with-non-iterable-objects/96329.

efimov-mikhail avatar Jun 22 '25 11:06 efimov-mikhail