trio icon indicating copy to clipboard operation
trio copied to clipboard

Add trio.from_thread.check_cancelled api to allow threads to efficiently poll for cancellation

Open richardsheridan opened this issue 2 years ago • 3 comments

This PR reuses the thread local object in _threads.py to stash the raise_cancel provided to abort, which basically implements a suggestion by @njsmith in https://github.com/python-trio/trio/issues/961#issuecomment-501597755:

Another idea: in run_sync_in_thread, start the thread, then block in wait_task_rescheduled. In our blocked state, if our >abort_fn is called, stash the raise_fn somewhere where the thread can see it. Then trio.from_thread.check_cancel() is just >something like if THREAD_STATE.raise_fn is not None: raise_fn().

This might be simpler, but I bet it will also have some complications by the time we figure out how to forward arbitrary >cancellations to the child thread, and also let the thread re-enter trio and deliver cancellations to those tasks.

I would like to use this to allow a sqlite query to poll for cancellation. Currently, I would have to dig into the internals and use this recipe from @oremanj:

def make_cancel_poller():
    _cancel_status = trio.lowlevel.current_task()._cancel_status

    def poll_for_cancel(*args, **kwargs):
        if _cancel_status.effectively_cancelled:
            raise trio.Cancelled._create()

    return poll_for_cancel

Furthermore, that method is not thread safe if shielding is being toggled:

https://github.com/python-trio/trio/blob/2d62ff037f73ffd129485809ac0f14001e1eab6d/trio/_core/_run.py#L223-L231

richardsheridan avatar Aug 08 '22 16:08 richardsheridan

Codecov Report

Merging #2392 (ab092b0) into master (b161fec) will increase coverage by 0.00%. The diff coverage is 100.00%.

Additional details and impacted files
@@           Coverage Diff            @@
##           master    #2392    +/-   ##
========================================
  Coverage   99.13%   99.14%            
========================================
  Files         115      115            
  Lines       17242    17432   +190     
  Branches     3085     3107    +22     
========================================
+ Hits        17093    17283   +190     
  Misses        104      104            
  Partials       45       45            
Files Coverage Δ
trio/_tests/test_threads.py 100.00% <100.00%> (ø)
trio/_threads.py 100.00% <100.00%> (ø)
trio/from_thread.py 100.00% <100.00%> (ø)

codecov[bot] avatar Aug 08 '22 17:08 codecov[bot]

I'd like to see threads be able to run code directly in the nursery that started it, this should fix this issue more robustly.

I'll edit this comment to link to what I mean shortly. Remind me if I don't

graingert avatar Aug 08 '22 18:08 graingert

Is this similar to what you are suggesting: https://github.com/python-trio/trio/issues/606#issuecomment-515667078

richardsheridan avatar Aug 08 '22 18:08 richardsheridan

@graingert I have implemented your request (finally!). Would you have a look at it? Also, @agronholm this might have implications for AnyIO so it would be better to have your input sooner than later!

richardsheridan avatar Mar 11 '23 00:03 richardsheridan

This looks very interesting, and I can use this. I'm looking forward to adding this to AnyIO once it's released for Trio.

agronholm avatar Sep 03 '23 17:09 agronholm

it should be type-safe now, but i'm a typing newbie so maybe I could get a review on that aspect as well?

richardsheridan avatar Sep 03 '23 17:09 richardsheridan

What does this do if the thread call gets "uncancelled"? Ie this sequence:

  1. Some outer enclosing scope enters the cancelled state, so run_sync becomes cancelled
  2. A more-inner enclosing scope gets mutated as cscope.shield = True, so run_sync is no longer cancelled
  3. Then the user calls check_cancelled

The straightforward implementation was to make this a level change in cancellation for the thread, so check_cancelled forever raises Cancelled IFF the cancellation delivery was attempted at least once. I believe the docs reflect this, though maybe that needs to be clarified.

I think this is also the "best" behavior because it is an approximation of what would happen if the thread were blocked in trio.from_thread.run(afn). In that case, even in your scenario, a Cancelled would come out of afn and take an indeterminate amount of time to propagate back to the host task, meaning that it could be delivered after raising shields.

richardsheridan avatar Sep 03 '23 22:09 richardsheridan

also if the user has selected cancellable=True, the previous behavior would be to wake up and abandon the thread. I wanted check_cancelled to be a tool for abandoned threads to end early and should not depend on the host task's cancel status.

richardsheridan avatar Sep 03 '23 22:09 richardsheridan

What does this do if the thread call gets "uncancelled"? Ie this sequence:

  1. Some outer enclosing scope enters the cancelled state, so run_sync becomes cancelled
  2. A more-inner enclosing scope gets mutated as cscope.shield = True, so run_sync is no longer cancelled
  3. Then the user calls check_cancelled

Could you explain how the host task would get un-cancelled when it's still waiting on run_sync()? And if it gets cancelled, then this is pretty clear: the thread gets the cancelled flag set too, no matter what happens to the host task afterwards.

agronholm avatar Sep 04 '23 09:09 agronholm

What does this do if the thread call gets "uncancelled"?

I'd regard support for un-cancellation as strictly optional, doubly so regarding threads.

IMHO a cancel scope either shields, or it doesn't. Changing its mode after the fact smells like a race condition waiting to happen.

smurfix avatar Sep 04 '23 11:09 smurfix

I'd be happy to hear suggestions for applying the context in from_thread.run that could avoid two extra checkpoints!

That's the only available way to change a task's context without creating a new task, and it's probably lower-overhead than the new task approach (but I didn't measure). The new task approach would be to push the reentrant call into a system task, which can have whatever context you want, and wrap it in a cancel scope that gets cancelled when the to_thread.run_sync call gets cancelled. I probably prefer the approach here where the reentrant functions run underneath the original task, but they're both workable.

What does this do if the thread call gets "uncancelled"?

from_thread.check_cancelled(): The initial cancellation will call to_thread.run_sync's abort_fn, which will save the raise_cancel and use it the next time the thread checks for cancellation; the "uncancellation" will not be noticed. I think this is fine; it's well-defined, deterministic modulo the question of whether the thread checks for cancellation at all in between when cancellation is asserted and when the thread completes (which is a source of nondeterminism even without shielding), and the Cancelled exception will propagate fine through the shield (we have and test a similar edge case in the non-threads world).

from_thread.run(): The function that runs upon reentry is executed as a normal async call in the to_thread.run_sync task, so it does see the uncancellation. I'm realizing that this results in a discrepancy between from_thread.run(trio.sleep, 0) and from_thread.check_cancelled(), which is a bit unfortunate, and probably we should document it, but I don't think it's a problem in practice. from_thread.check_cancelled() can't observe that the 'uncancellation' has occurred at all unless it pokes into internals (the CancelStatus object). There's no way I know of to be woken up if 'uncancellation' occurs, so the main to_thread.run_sync task can't help it out. Maybe inspecting CancelStatus is fine/worthwhile? Doesn't quite seem worth it to me (it breaks our rules about how non-_core code is supposed to consume _core code), but it's a near thing. But it would be easy to add later if we decide to; I'm comfortable taking the simpler road for now.

There was a comment upthread that said inspecting CancelStatus wouldn't be thread-safe, but I don't think that's actually a problem: if the check is racing with the uncancellation then AFAICT it doesn't really matter whether the check sees the uncancellation or not. However my earlier snipept (which would raise trio.Cancelled._create() if the CancelStatus showed a cancelled status) was incorrect because it didn't handle the KeyboardInterrupt possibility. So we need the current raise_cancel logic regardless; any CancelStatus check for uncancellation would be in addition to that.

oremanj avatar Oct 11 '23 05:10 oremanj

from_thread.check_cancelled(): The initial cancellation will call to_thread.run_sync's abort_fn, which will save the raise_cancel and use it the next time the thread checks for cancellation; the "uncancellation" will not be noticed. I think this is fine; it's well-defined, deterministic [...]

I think there may be another nondeterministic edge case: if the cancellation arrives while the task is busy in await message.run() AND that work happens to be shielded, AND a shield gets raised uptree, then the main "have we ever been cancelled" check will not be triggered. I think a nursery with a task with the sole purpose of waiting for cancellation might be needed to get that exactly right.

As long as we need a nursery to get those semantics right, we might as well also consider an alternative implementation where the each from_thread call spawns a new task in that nursery. After all, it's not really the fact that we are reusing the task which is so useful, but rather that it exists in the right cancellation, contextvar, and treevar contexts. Child tasks would have all these properties, although they might be marginally less efficient.

richardsheridan avatar Oct 11 '23 18:10 richardsheridan

I think there may be another nondeterministic edge case: if the cancellation arrives while the task is busy in await message.run() AND that work happens to be shielded, AND a shield gets raised uptree, then the main "have we ever been cancelled" check will not be triggered. I think a nursery with a task with the sole purpose of waiting for cancellation might be needed to get that exactly right.

I think it depends what you mean by "exactly right". :-) The analogous situation without the thread would not raise a cancellation in this case, so I don't think the thread needs to either. In any event I don't think it would be worth the trouble and overhead of a whole separate task to wait for cancellation.

In general the way Trio behaves is that you need to execute a checkpoint while you are effectively cancelled (have a cancelled cancel scope closer to you than the closest shielded cancel scope) in order to raise Cancelled. If you perform a cancellation and then a shielding in the way that causes the effectively-cancelled status to toggle back and forth, but no checkpoints observe the "cancelled" state, then for your purposes it didn't happen. I think this analogizes nicely to threads if we take the "checkpoints" to be calls to check_cancelled() + (unshielded) checkpoints in reentrant calls to from_thread.run() functions. We can only get that exact behavior if we peek inside the internal CancelStatus object on check_cancelled; otherwise we will get a more conservative / "cancel-happy" behavior where check_cancelled will raise Cancelled if the to_thread.run_sync() call has ever been effectively cancelled, even if it's not anymore. Which is also reasonable! We just need to document the subtleties here.

oremanj avatar Oct 13 '23 01:10 oremanj