cpython icon indicating copy to clipboard operation
cpython copied to clipboard

GH-90997: Wrap `yield from`/`await` in a virtual `try`/`except StopIteration`

Open brandtbucher opened this issue 1 year ago • 1 comments

Supersedes #31968 by setting up a virtual try/except StopIteration to handle the case where a delegated sub-generator returns a value during a close or throw call through a suspended frame. This moves the logic out of the generator throw implementation and into the bytecode, which requires fewer frame hacks. It does, however, require replacing LOAD_ASSERTION_ERROR with a more general LOAD_EXCEPTION_TYPE opcode that can also handle StopIteration.

It also updates the docs for SEND, which are incomplete.

  • Issue: gh-90997

brandtbucher avatar Aug 15 '22 22:08 brandtbucher

This adds a lot of bulk to the bytecode. Take this minimal function:

async def foo(x):
    await x

main

  1           0 RETURN_GENERATOR
              2 POP_TOP
              4 RESUME                   0

  2           6 LOAD_FAST                0 (x)
              8 GET_AWAITABLE            0
             10 LOAD_CONST               0 (None)
        >>   12 SEND                     3 (to 20)
             14 YIELD_VALUE              2
             16 RESUME                   3
             18 JUMP_BACKWARD_NO_INTERRUPT     4 (to 12)
        >>   20 POP_TOP
             22 LOAD_CONST               0 (None)
             24 RETURN_VALUE

this PR

  1           0 RETURN_GENERATOR
              2 POP_TOP
              4 RESUME                   0

  2           6 LOAD_FAST                0 (x)
              8 GET_AWAITABLE            0
             10 LOAD_CONST               0 (None)
        >>   12 SEND                     3 (to 20)
             14 YIELD_VALUE              2
             16 RESUME                   3
             18 JUMP_BACKWARD_NO_INTERRUPT     4 (to 12)
        >>   20 POP_TOP
             22 LOAD_CONST               0 (None)
             24 RETURN_VALUE
        >>   26 LOAD_EXCEPTION_TYPE      1 (StopIteration)
             28 CHECK_EXC_MATCH
             30 POP_JUMP_FORWARD_IF_TRUE     1 (to 34)
             32 RERAISE                  0
        >>   34 LOAD_ATTR                0 (value)
             54 SWAP                     3
             56 POP_TOP
             58 POP_TOP
             60 POP_TOP
             62 LOAD_CONST               0 (None)
             64 RETURN_VALUE
ExceptionTable:
  14 to 14 -> 26 [2]

What I had envisioned

  1           0 RETURN_GENERATOR
              2 POP_TOP
              4 RESUME                   0

  2           6 LOAD_FAST                0 (x)
              8 GET_AWAITABLE            0
             10 LOAD_CONST               0 (None)
        >>   12 SEND                     3 (to 20)
             14 YIELD_VALUE              2
             16 RESUME                   3
             18 JUMP_BACKWARD_NO_INTERRUPT     4 (to 12)
        >>   20 POP_TOP
             22 LOAD_CONST               0 (None)
             24 RETURN_VALUE
             26 THROW                (to 20)
             28 JUMP_BACKWARD_NO_INTERRUPT     (to 14)
ExceptionTable:
  14 to 14 -> 26 [2]

Where THROW throws into the delegated awaitable. If that delegated awaitable returns, then jump. if it yields, then push the value. If it raises, then propagate.

Much like SEND sends to the delegated awaitable, THROW throws into it.

See also https://github.com/faster-cpython/ideas/issues/448

markshannon avatar Aug 16 '22 11:08 markshannon

This adds a lot of bulk to the bytecode.

Yeah, like I said in the meetings: this about triples the size of the bytecode (all in the cold path, though). If all we care about here is the literal size of the bytecode, then we can pack everything in the exception handler into its own dedicated instruction.

This would look a lot like END_ASYNC_FOR, except it would push a value and jump on return. But combining the instructions like that seems to be moving in the wrong direction - in fact, I'd like to see END_ASYNC_FOR broken up like we have here.

What I had envisioned

  1           0 RETURN_GENERATOR
              2 POP_TOP
              4 RESUME                   0

  2           6 LOAD_FAST                0 (x)
              8 GET_AWAITABLE            0
             10 LOAD_CONST               0 (None)
        >>   12 SEND                     3 (to 20)
             14 YIELD_VALUE              2
             16 RESUME                   3
             18 JUMP_BACKWARD_NO_INTERRUPT     4 (to 12)
        >>   20 POP_TOP
             22 LOAD_CONST               0 (None)
             24 RETURN_VALUE
             26 THROW                (to 20)
             28 JUMP_BACKWARD_NO_INTERRUPT     (to 14)
ExceptionTable:
  14 to 14 -> 26 [2]

Where THROW throws into the delegated awaitable. If that delegated awaitable returns, then jump. if it yields, then push the value. If it raises, then propagate.

Much like SEND sends to the delegated awaitable, THROW throws into it.

See also faster-cpython/ideas#448

I've been trying to get something like this to work for the last few weeks in between release blocker firefighting and vacations, but I've just been left convinced that there are too many wrinkles to make it this elegant. Even setting aside the relatively minor annoyance that we need two THROW instructions right now (one forward and one backward), here are just a couple of the issues make this much more complicated than "just SEND but call throw":

  • We actually need to call close if the exception is a GeneratorExit...
  • ...except if we're getting it from an athrow or aclose call, in which case we have different logic for deciding to call athrow or aclose, which have different semantics in the GeneratorExit case. Grep for close_on_genexit in genobject.c for the first breadcrumb in the trail of special async generator logic.
  • We aren't supposed to chain a thrown/closing GeneratorExit to the RuntimeError complaining that we've ignored it, which is a pain to do without changing how literally all other exceptions are chained.
  • The current code has fast paths for closing or throwing into a delegated generator or coroutine if they are exact. Moving all throws and closes into the bytecode would probably make the majority of throws and closes (including generator finalization) a bit slower. If we want to leave these fast paths in the C code, then I don't much see the point of moving the rest into THROW.

For reference, here is my mostly-but-not-quite working THROW branch. If we're really sure that it's worth trying to hack around these issues (and others) in the THROW instructions, then I can keep plugging away at it. But this PR only extracts the problematic parts of the throw/close implementation while leaving all of the other subtle logic alone. That feels like the right separation between the C/Python boundary, in my opinion.

Add to all this that I'm very confident that this PR behaves correctly, and I understand it very well, but am very far from being able to promise that a THROW implementation will preserve the exact semantics for things like async generators and coroutines, of which I know very, very little.

brandtbucher avatar Aug 17 '22 05:08 brandtbucher

Also note that for many compiler-generated await loops, we can simplify the emitted bytecode to skip extracting and pushing the return value where the compiler knows it's not used. Doing the same for all await and yield from expressions would be possible but harder, since by the point we're emitting it we're buried deep in general-purpose expression-compiling code and don't have much context on how the result is used.

brandtbucher avatar Aug 17 '22 05:08 brandtbucher

Sure, there are many complications around what is thrown, or whether close is called, etc. First, move all those complications into a new function, _Py_throw_into().

With this new function:

PySendResult
_Py_throw_into(PyGenObject *subgen, int close_on_genexit, PyObject *typ, PyObject *val, PyObject *tb, PyObject **presult);

We can initially refactor _gen_throw to:

static PyObject *
_gen_throw(PyGenObject *gen, int close_on_genexit,
           PyObject *typ, PyObject *val, PyObject *tb)
{
    PyObject *yf = _PyGen_yf(gen);
    if (yf) {
        PyObect *res;
        PySendResult what = _Py_throw_into(yf, close_on_genexit, type, val, tb, &res);
        switch(what) {
            case PYGEN_RETURN:
                /* Stop SEND loop */
                /* Awkward popping and jumping goes here */
                return gen_send(gen, val);
            case PYGEN_ERROR:
                 return NULL;
            case PYGEN_NEXT:
               return res;
        }
    }
throw_here:
    ...

That doesn't help much by itself, but _Py_throw_into is what we need to implement THROW which, as you point out, cannot jump because it needs close_on_genexit as its oparg.

Instead of generating

  THROW  target
  JUMP_BACKWARD_NO_INTERRUPT back_to_yield

we generate

  THROW close_on_genexit
  POP_JUMP_IF_TRUE target
  JUMP_BACKWARD_NO_INTERRUPT back_to_yield

THROW should be little more than a wrapper around _Py_throw_into, raising on PYGEN_ERROR, pushing True for PYGEN_RETURN and False for PYGEN_NEXT.

markshannon avatar Aug 17 '22 10:08 markshannon

See also https://github.com/python/cpython/pull/96040

markshannon avatar Aug 17 '22 10:08 markshannon

Closing and reopening to re-run the Azure Pipelines failure.

brandtbucher avatar Aug 17 '22 21:08 brandtbucher

@markshannon, I've updated the PR based on our discussion this morning. Let me know how it looks.

brandtbucher avatar Aug 17 '22 21:08 brandtbucher

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

bedevere-bot avatar Aug 19 '22 10:08 bedevere-bot

This is a definite improvement over the current implementation.

Ultimately we want all the handling of sub-generators to done in bytecode. This is a good first step, moving the flow control into the bytecode.

Yeah, this fixes the main issue we were concerned with, but I agree that it makes sense to keep trying to port over more of the throw logic. I plan to return to this whenever I get a few extra cycles.

brandtbucher avatar Aug 19 '22 18:08 brandtbucher

With this PR, async functions with exactly 20 nested blocks started crashing with an assert failure, e.g.:

async def t():
    async with h,t,t,o,f,y,o,t,r,o,f,t,f,r,t,m,r,o,t,l:n
python: Python/compile.c:7185: pop_except_block: Assertion `stack->depth > 0' failed.

see https://github.com/python/cpython/issues/116767

encukou avatar Apr 08 '24 13:04 encukou