anyio icon indicating copy to clipboard operation
anyio copied to clipboard

Task group is swallowing `CancelledError`.

Open jonathanslenders opened this issue 3 years ago • 11 comments

If any of the tasks within the task group raise a CancelledError, then that does not propagate out of the task group. In my case, this happens because the code within the task group is cancelled from somewhere else.

from anyio import create_task_group
import asyncio

async def main():
    async def in_task_group():
        raise asyncio.CancelledError

    async with create_task_group() as tg:
        tg.start_soon(in_task_group)

asyncio.run(main())

Would this be a bug? Or is this expected behavior?

jonathanslenders avatar Oct 06 '21 16:10 jonathanslenders

Related to this, the following snippet exposes very weird behavior:

from anyio import create_task_group
import asyncio

async def test():
    # Create a task group with one very long running task in here.
    async def in_task_group():
        await asyncio.sleep(100)

    async with create_task_group() as tg:
        tg.start_soon(in_task_group)

    # - The main function should cancel the coroutine around this point -

    # I would not expect any of the following to be printed.
    # Somehow, the task within the task group gets cancelled, but this function
    # does not get cancelled.

    print('Before sleep')
    await asyncio.sleep(2)
    print('After sleep')

async def main():
    # Create task.
    task = asyncio.create_task(test())

    # Cancel task.
    await asyncio.sleep(1)
    task.cancel()

    await task

asyncio.run(main())

jonathanslenders avatar Oct 06 '21 17:10 jonathanslenders

Well...this is how anyio has always worked. On trio we don't have this problem because cancel scopes are the only way to cancel things there. On asyncio, cancelling a task just means raising CancelledError once in the target task. I think that most people have not had this problem because they've used structural concurrency all the way. @graingert any thoughts?

agronholm avatar Oct 07 '21 08:10 agronholm

Well, imagine we have a library that uses structured concurrency everywhere within that library. But this library is called by a different codebase that does not follow structured concurrency. The above issue means that it's impossible to use libraries written with anyio in a code base that doesn't use anyio by itself. Is that correct?

jonathanslenders avatar Oct 07 '21 13:10 jonathanslenders

Suppose a task group was already cancelled via SC, and then another task natively cancels the host task. How would AnyIO tell the difference?

agronholm avatar Oct 08 '21 20:10 agronholm

@agronholm, honestly, I don't know. I worked around this by using a cancel scope anyway. But I think the issue remains that at some point, a task created by a library using anyio, could be cancelled using normal asyncio cancellation.

I wonder whether it's feasible to create a little wrapper that runs the anyio task in a cancel scope, catches (elsewhere) the CancelledError that asyncio raises, and as a result cancels the cancel scope.

jonathanslenders avatar Oct 13 '21 08:10 jonathanslenders

A potential solution that I came up a couple days ago would be to wait for the subtasks in a shielded cancel scope. That way, only native cancellations would get through and would then be forced to propagate out of the task group.

agronholm avatar Oct 13 '21 08:10 agronholm

To detect native cancellations prior to that, we would check, upon cancellation, if one of the enclosing cancel scopes had been cancelled. If not, it's a native cancellation.

agronholm avatar Oct 13 '21 08:10 agronholm

I think I found a way around this, but on Python 3.9 and above only: we can set a special cancellation message in the exception to detect cancel scope based cancellations.

agronholm avatar Feb 12 '22 22:02 agronholm

@jonathanslenders (https://github.com/agronholm/anyio/issues/374#issuecomment-936789037):

Related to this, the following snippet exposes very weird behavior: [...]

I translated this example to the new (3.11) asyncio.TaskGroup idiom and it seems to work.

import asyncio

async def test():
    # Create a task group with one very long running task in here.
    async def in_task_group():
        await asyncio.sleep(100)

    try:
        async with asyncio.TaskGroup() as tg:
            tg.create_task(in_task_group())
    except asyncio.CancelledError:
        print("pass")

    # - The main function should cancel the coroutine around this point -

    # I would not expect any of the following to be printed.
    # Somehow, the task within the task group gets cancelled, but this function
    # does not get cancelled.

    print('Before sleep')
    await asyncio.sleep(2)
    print('After sleep')

async def main():
    # Create task.
    task = asyncio.create_task(test())

    # Cancel task.
    await asyncio.sleep(1)
    task.cancel()

    await task

asyncio.run(main())

(Only this is really changed:)

    try:
        async with asyncio.TaskGroup() as tg:
            tg.create_task(in_task_group())
    except asyncio.CancelledError:
        print("pass")

This produces the following output:

pass
Before sleep
After sleep

Is that what you would like to happen?

gvanrossum avatar Feb 16 '22 21:02 gvanrossum

@gvanrossum : Thanks for responding (I just see your reply). This is indeed what I'd expect. The CancelledError should propagate out of the TaskGroup.

(I'm no longer sure whether ever catching a CancelledError is a good idea though. With anyio, I prefer finally blocks that are shielded from cancellation.)

jonathanslenders avatar Jul 05 '22 10:07 jonathanslenders

Thanks for confirming.

gvanrossum avatar Jul 05 '22 21:07 gvanrossum

Fixing this is a high priority for v4.0. My initial attempts have not gone very well, as I'm seeing recursion errors from exception groups.

agronholm avatar Oct 18 '22 07:10 agronholm

Do you need my involvement? If not, I'd like to unsubscribe from this issue.

gvanrossum avatar Oct 18 '22 15:10 gvanrossum

Do you need my involvement? If not, I'd like to unsubscribe from this issue.

It's okay, feel free to unsubscribe.

agronholm avatar Oct 19 '22 10:10 agronholm

@agronholm are any updates on this? Hope you'll find opportunity to fix it 🙏🏻

decaz avatar Nov 29 '22 13:11 decaz

I already have, in #496. The PR includes some heavy handed overall changes to cancellation semantics, so it requires extra care and review. I myself am not done with it yet.

agronholm avatar Nov 29 '22 18:11 agronholm

bump

reidmeyer avatar Jan 24 '23 10:01 reidmeyer

bump

I'm not working fast enough to your tastes? I have a dozen projects to maintain, and I'll switch back to AnyIO soon-ish.

agronholm avatar Jan 24 '23 15:01 agronholm

bump

I'm not working fast enough to your tastes? I have a dozen projects to maintain, and I'll switch back to AnyIO soon-ish.

Sorry, don't mean to be rude. Have had a long day diving down the rabbit hole.

reidmeyer avatar Jan 24 '23 15:01 reidmeyer

Welcome to the rabbit warren, I'm also hitting this one. Trying to find a reasonable workaround isn't easy, and this Trio bug doesn't exactly help.

smurfix avatar Jan 24 '23 15:01 smurfix

This was incidentally the last bug I was working on before getting side-tracked by other projects a few weeks ago. I got stumped when playing whack-a-mole with test failures, and the solution itself was pretty complex too. Cancellation is really, really hard to get right.

agronholm avatar Jan 24 '23 15:01 agronholm

My colleagues and I faced this bug months ago, where a disconnected sse stream wasn't being handled in the exception catch. Our solution was downgraded fastapi/starlette to a 2 year old version. Hoping in a few months we'll be able to get up to date.

reidmeyer avatar Jan 24 '23 15:01 reidmeyer

My colleagues and I faced this bug months ago, where a disconnected sse stream wasn't being handled in the exception catch. Our solution was downgraded fastapi/starlette to a 2 year old version. Hoping in a few months we'll be able to get up to date.

Sorry for the trouble. It may not be much of a consolation, but I've been hit by this bug too occasionally, and it's my highest priority in this project right now (that is, when I get back to working on it).

agronholm avatar Jan 24 '23 15:01 agronholm

Any ETA on this? Anything we can do to help?

drernie avatar May 10 '23 04:05 drernie

The first order of business I have with AnyIO right now is getting the v3.7.0 release out. It contains a ton of other fixes that have waited a long time to be released. One of these fixes will even make tackling this issue a bit easier by reducing the amount of new code that would have to be written. I'm just about ready to make that release. After that, I can fully focus on fixing this particular issue, but I will have to scrap my previous attempt and start from scratch, as the code got pretty complicated back then and I was overwhelmed with the complexity. As for an ETA, I hope to get this done in 2-3 weeks.

agronholm avatar May 12 '23 23:05 agronholm

I've restarted my efforts based on current master. I have a test in place that reproduces the issue. I think I might start writing a fix that works on Python 3.11+ (where thanks to uncancel() we can actually track the cancellations without too many hacks). Then I could work on said hacks for earlier Pythons, as I was trying to do before going on a hiatus.

agronholm avatar May 16 '23 22:05 agronholm

Hi @agronholm, If you didn't start the progress, I'll be happy to help. Are there any good points to start?

T-256 avatar May 17 '23 14:05 T-256

Any luck?

reidmeyer avatar Jun 01 '23 11:06 reidmeyer

Happy to report that I'm making progress again, and hopefully I can make a release candidate of v4.0 this week, unless I encounter unexpectedly difficult corner cases.

agronholm avatar Jul 10 '23 21:07 agronholm

Happy to report that I'm making progress again, and hopefully I can make a release candidate of v4.0 this week, unless I encounter unexpectedly difficult corner cases.

Awesome to hear!

reidmeyer avatar Jul 10 '23 21:07 reidmeyer