cpython
cpython copied to clipboard
GH-90997: Wrap `yield from`/`await` in a virtual `try`/`except StopIteration`
Supersedes #31968 by setting up a virtual try
/except StopIteration
to handle the case where a delegated sub-generator returns a value during a close
or throw
call through a suspended frame. This moves the logic out of the generator throw
implementation and into the bytecode, which requires fewer frame hacks. It does, however, require replacing LOAD_ASSERTION_ERROR
with a more general LOAD_EXCEPTION_TYPE
opcode that can also handle StopIteration
.
It also updates the docs for SEND
, which are incomplete.
- Issue: gh-90997
This adds a lot of bulk to the bytecode. Take this minimal function:
async def foo(x):
await x
main
1 0 RETURN_GENERATOR
2 POP_TOP
4 RESUME 0
2 6 LOAD_FAST 0 (x)
8 GET_AWAITABLE 0
10 LOAD_CONST 0 (None)
>> 12 SEND 3 (to 20)
14 YIELD_VALUE 2
16 RESUME 3
18 JUMP_BACKWARD_NO_INTERRUPT 4 (to 12)
>> 20 POP_TOP
22 LOAD_CONST 0 (None)
24 RETURN_VALUE
this PR
1 0 RETURN_GENERATOR
2 POP_TOP
4 RESUME 0
2 6 LOAD_FAST 0 (x)
8 GET_AWAITABLE 0
10 LOAD_CONST 0 (None)
>> 12 SEND 3 (to 20)
14 YIELD_VALUE 2
16 RESUME 3
18 JUMP_BACKWARD_NO_INTERRUPT 4 (to 12)
>> 20 POP_TOP
22 LOAD_CONST 0 (None)
24 RETURN_VALUE
>> 26 LOAD_EXCEPTION_TYPE 1 (StopIteration)
28 CHECK_EXC_MATCH
30 POP_JUMP_FORWARD_IF_TRUE 1 (to 34)
32 RERAISE 0
>> 34 LOAD_ATTR 0 (value)
54 SWAP 3
56 POP_TOP
58 POP_TOP
60 POP_TOP
62 LOAD_CONST 0 (None)
64 RETURN_VALUE
ExceptionTable:
14 to 14 -> 26 [2]
What I had envisioned
1 0 RETURN_GENERATOR
2 POP_TOP
4 RESUME 0
2 6 LOAD_FAST 0 (x)
8 GET_AWAITABLE 0
10 LOAD_CONST 0 (None)
>> 12 SEND 3 (to 20)
14 YIELD_VALUE 2
16 RESUME 3
18 JUMP_BACKWARD_NO_INTERRUPT 4 (to 12)
>> 20 POP_TOP
22 LOAD_CONST 0 (None)
24 RETURN_VALUE
26 THROW (to 20)
28 JUMP_BACKWARD_NO_INTERRUPT (to 14)
ExceptionTable:
14 to 14 -> 26 [2]
Where THROW
throws into the delegated awaitable.
If that delegated awaitable returns, then jump.
if it yields, then push the value.
If it raises, then propagate.
Much like SEND
sends to the delegated awaitable, THROW
throws into it.
See also https://github.com/faster-cpython/ideas/issues/448
This adds a lot of bulk to the bytecode.
Yeah, like I said in the meetings: this about triples the size of the bytecode (all in the cold path, though). If all we care about here is the literal size of the bytecode, then we can pack everything in the exception handler into its own dedicated instruction.
This would look a lot like END_ASYNC_FOR
, except it would push a value and jump on return. But combining the instructions like that seems to be moving in the wrong direction - in fact, I'd like to see END_ASYNC_FOR
broken up like we have here.
What I had envisioned
1 0 RETURN_GENERATOR 2 POP_TOP 4 RESUME 0 2 6 LOAD_FAST 0 (x) 8 GET_AWAITABLE 0 10 LOAD_CONST 0 (None) >> 12 SEND 3 (to 20) 14 YIELD_VALUE 2 16 RESUME 3 18 JUMP_BACKWARD_NO_INTERRUPT 4 (to 12) >> 20 POP_TOP 22 LOAD_CONST 0 (None) 24 RETURN_VALUE 26 THROW (to 20) 28 JUMP_BACKWARD_NO_INTERRUPT (to 14) ExceptionTable: 14 to 14 -> 26 [2]
Where
THROW
throws into the delegated awaitable. If that delegated awaitable returns, then jump. if it yields, then push the value. If it raises, then propagate.Much like
SEND
sends to the delegated awaitable,THROW
throws into it.See also faster-cpython/ideas#448
I've been trying to get something like this to work for the last few weeks in between release blocker firefighting and vacations, but I've just been left convinced that there are too many wrinkles to make it this elegant. Even setting aside the relatively minor annoyance that we need two THROW
instructions right now (one forward and one backward), here are just a couple of the issues make this much more complicated than "just SEND
but call throw
":
- We actually need to call
close
if the exception is aGeneratorExit
... - ...except if we're getting it from an
athrow
oraclose
call, in which case we have different logic for deciding to callathrow
oraclose
, which have different semantics in theGeneratorExit
case. Grep forclose_on_genexit
ingenobject.c
for the first breadcrumb in the trail of special async generator logic. - We aren't supposed to chain a thrown/closing
GeneratorExit
to theRuntimeError
complaining that we've ignored it, which is a pain to do without changing how literally all other exceptions are chained. - The current code has fast paths for closing or throwing into a delegated generator or coroutine if they are exact. Moving all throws and closes into the bytecode would probably make the majority of throws and closes (including generator finalization) a bit slower. If we want to leave these fast paths in the C code, then I don't much see the point of moving the rest into
THROW
.
For reference, here is my mostly-but-not-quite working THROW
branch. If we're really sure that it's worth trying to hack around these issues (and others) in the THROW
instructions, then I can keep plugging away at it. But this PR only extracts the problematic parts of the throw
/close
implementation while leaving all of the other subtle logic alone. That feels like the right separation between the C/Python boundary, in my opinion.
Add to all this that I'm very confident that this PR behaves correctly, and I understand it very well, but am very far from being able to promise that a THROW
implementation will preserve the exact semantics for things like async generators and coroutines, of which I know very, very little.
Also note that for many compiler-generated await
loops, we can simplify the emitted bytecode to skip extracting and pushing the return value where the compiler knows it's not used. Doing the same for all await
and yield from
expressions would be possible but harder, since by the point we're emitting it we're buried deep in general-purpose expression-compiling code and don't have much context on how the result is used.
Sure, there are many complications around what is thrown, or whether close is called, etc.
First, move all those complications into a new function, _Py_throw_into()
.
With this new function:
PySendResult
_Py_throw_into(PyGenObject *subgen, int close_on_genexit, PyObject *typ, PyObject *val, PyObject *tb, PyObject **presult);
We can initially refactor _gen_throw
to:
static PyObject *
_gen_throw(PyGenObject *gen, int close_on_genexit,
PyObject *typ, PyObject *val, PyObject *tb)
{
PyObject *yf = _PyGen_yf(gen);
if (yf) {
PyObect *res;
PySendResult what = _Py_throw_into(yf, close_on_genexit, type, val, tb, &res);
switch(what) {
case PYGEN_RETURN:
/* Stop SEND loop */
/* Awkward popping and jumping goes here */
return gen_send(gen, val);
case PYGEN_ERROR:
return NULL;
case PYGEN_NEXT:
return res;
}
}
throw_here:
...
That doesn't help much by itself, but _Py_throw_into
is what we need to implement THROW
which, as you point out, cannot jump because it needs close_on_genexit
as its oparg.
Instead of generating
THROW target
JUMP_BACKWARD_NO_INTERRUPT back_to_yield
we generate
THROW close_on_genexit
POP_JUMP_IF_TRUE target
JUMP_BACKWARD_NO_INTERRUPT back_to_yield
THROW
should be little more than a wrapper around _Py_throw_into
, raising on PYGEN_ERROR
, pushing True
for PYGEN_RETURN
and False
for PYGEN_NEXT
.
See also https://github.com/python/cpython/pull/96040
Closing and reopening to re-run the Azure Pipelines failure.
@markshannon, I've updated the PR based on our discussion this morning. Let me know how it looks.
When you're done making the requested changes, leave the comment: I have made the requested changes; please review again
.
This is a definite improvement over the current implementation.
Ultimately we want all the handling of sub-generators to done in bytecode. This is a good first step, moving the flow control into the bytecode.
Yeah, this fixes the main issue we were concerned with, but I agree that it makes sense to keep trying to port over more of the throw
logic. I plan to return to this whenever I get a few extra cycles.
With this PR, async functions with exactly 20 nested blocks started crashing with an assert
failure, e.g.:
async def t():
async with h,t,t,o,f,y,o,t,r,o,f,t,f,r,t,m,r,o,t,l:n
python: Python/compile.c:7185: pop_except_block: Assertion `stack->depth > 0' failed.
see https://github.com/python/cpython/issues/116767