trio icon indicating copy to clipboard operation
trio copied to clipboard

Cancellation blocks `task_status.started`

Open smurfix opened this issue 1 year ago • 2 comments

The problem: when a task starts a shielded subtask, but is cancelled before the subtask re-parents itself, the cancellation isn't propagated until the subtask ends.

Consider this code:

import trio

async def t(task_status):
    with trio.CancelScope(shield=True) as sc:  # turn off shielding, the code works
        await trio.sleep(2)
        print("Send",sc)
        task_status.started(sc)
        await trio.sleep(3)
        print("Terminating")

async def r(tg):
    sc = await tg.start(t)
    print("Receive",sc)
    sc.cancel()

async def main():
    async with trio.open_nursery() as tg:
        tg.start_soon(r,tg)
        await trio.sleep(1)
        tg.cancel_scope.cancel()  # comment this off, the code works

trio.run(main)

What I expect to happen is that tg.start returns the value from task_status.started, task t gets cancelled, this code takes one second to run.

In my "real" usecase the subtask starts a database connection which must be (a) cached and (b) closed properly. Thus turning off the shield won't work. The cancellation in the last line of main is a stand-in for any kund of exception that might happen in the rest of the program.

smurfix avatar Jan 24 '23 15:01 smurfix

Nursery.start() is implemented using an inner nursery. started() moves a task from this inner nursery to the final nursery for it, which causes the inner nursery to be closed. The inner nursery executes a checkpoint in its __aexit__. This raises Cancelled and stops you from seeing the result of started(). All the surprises here are in r(), not in t(). See also #1457 for other consequences of nursery __aexit__ being a checkpoint. We reached consensus there about a way forward but I don't think it was implemented. I believe fixing that would fix this issue too.

oremanj avatar Jan 24 '23 16:01 oremanj

what i believe should happen:

  • at t = 1 s: r is doing await tg.start(t), and t is halfway through its shielded await sleep(2). await sleep(2) is shielded so it keeps sleeping.

  • at t = 2 s:

    • print("Send", sc).

    • task_status.started(sc), so t gets reparented into tg.

    • t starts a shielded await sleep(3).

    • t is now started (i.e. reparented), so r's await tg.start(t) returns sc.

    • print("Receive", sc).

    • sc.cancel().

    • await sleep(3) raises Cancelled, sc catches it, and tg exits.

N.B.: it seems to me that the expected runtime should be 2 s, not the 1 s that you wrote?

what happened prior to #1696:

  • at t = 1 s: r is doing await tg.start(t), and t is halfway through its shielded await sleep(2). await sleep(2) is shielded so it keeps sleeping.

  • at t = 2 s:

    • print("Send", sc).

    • task_status.started(sc) incorrectly returns without actually reparenting t. even though r is in a cancelled scope, it remains blocked doing await tg.start(t) until t (which is shielded) finishes.

    • t starts a shielded await sleep(3).

  • at t = 5 s:

    • print("Terminating").

    • t finishes (inside of the internal tg.start(t) nursery). this wakes up the internal nursery in start(). await tg.start(t) raises Cancelled because internal_nursery.__aexit__ raises Cancelled when run in a cancelled scope.

    • ("Receive" never gets printed)

what happens currently:

  • at t = 1 s: r is doing await tg.start(t), and t is halfway through its shielded await sleep(2). await sleep(2) is shielded so it keeps sleeping.

  • at t = 2 s:

    • print("Send", sc).

    • task_status.started(sc) incorrectly returns without actually reparenting t. even though r is in a cancelled scope, it remains blocked doing await tg.start(t) until t (which is shielded) finishes.

    • t starts a shielded await sleep(3).

  • at t = 5 s:

    • print("Terminating").

    • t finishes (inside of the internal tg.start(t) nursery). this wakes up the internal nursery in start(). await tg.start(t) returns sc.

    • print("Receive", sc).

    • sc.cancel(). but sc has already exited, so this does nothing.

gschaffner avatar Dec 02 '23 14:12 gschaffner