tornado icon indicating copy to clipboard operation
tornado copied to clipboard

asyncio task cancellations are orphaned

Open dave-shawley opened this issue 3 years ago • 4 comments

  • Tornado Version 6.1
  • Python Versions tested Python 3.9.1 (tags/v3.9.1:1e5d33e9), Python 3.7.9 (tags/v3.7.9:13c94747)

If a request handler is waiting on a task that is cancelled, then the asyncio.CancelledError escapes and the request handler is never "finished". Previous to Python 3.8, this would have been caught in web.RequestHandler._execute. I suspect that it should be caught today as well so that the request handler but I am not completely sure what the correct action to take is.

Currently, the control flow seems to go back to the ioloop where it is discarded which leaves the connection hanging and the request handler never "finishes".

The following very contrived example shows the problem:

#!/usr/bin/env python
import asyncio
import json
import logging

from tornado import ioloop, web


class Handler(web.RequestHandler):
    async def prepare(self):
        super().prepare()
        self.logger = logging.getLogger('Handler')

    async def get(self):
        self.logger.info('processing request')
        self.work = asyncio.create_task(asyncio.sleep(10))

        work_result, killed = await asyncio.gather(
            self.do_work(), self.killer())

        self.set_status(200)
        self.write(json.dumps({'work_result': work_result, 'killed': killed}))

    async def do_work(self):
        self.logger.info('doing work')
        try:
            await self.work
        except asyncio.CancelledError:
            self.logger.warning('work cancelled!')
            raise
        return 10

    async def killer(self):
        await asyncio.sleep(4)
        self.logger.info('cancelling work')
        self.work.cancel()
        return True

    def on_finish(self):
        super().on_finish()
        self.logger.info('request handler has finished')


if __name__ == '__main__':
    logging.basicConfig(level=logging.DEBUG,
                        format='%(levelname)-15s %(name)s: %(message)s')
    iol = ioloop.IOLoop.current()
    app = web.Application([(r'/', Handler)])
    app.listen(8000)
    try:
        iol.start()
    except KeyboardInterrupt:
        iol.stop()

When run under Python 3.7, curl'ing "127.0.0.1:8000/" results in the following log output and the client receives a 500 Internal Server Error.

DEBUG           asyncio: Using selector: KqueueSelector
INFO            asyncio: poll took 2537.233 ms: 1 events
INFO            Handler: processing request
INFO            Handler: doing work
INFO            asyncio: poll 3999.843 ms took 4002.786 ms: timeout
INFO            Handler: cancelling work
WARNING         Handler: work cancelled!
ERROR           tornado.application: Uncaught exception GET / (127.0.0.1)
HTTPServerRequest(protocol='http', host='127.0.0.1:8000', method='GET', uri='/', version='HTTP/1.1', remote_ip='127.0.0.1')
Traceback (most recent call last):
  File "/Users/daves/Temporary/coros/env37/lib/python3.7/site-packages/tornado/web.py", line 1704, in _execute
    result = await result
  File "tornado-testie", line 19, in get
    self.do_work(), self.killer())
concurrent.futures._base.CancelledError
ERROR           tornado.access: 500 GET / (127.0.0.1) 4011.11ms
INFO            Handler: request handler has finished

When run under Python 3.9, the same request results in the following log output and the client never receives a response.

DEBUG           asyncio: Using selector: KqueueSelector
INFO            Handler: processing request
INFO            Handler: doing work
INFO            Handler: cancelling work
WARNING         Handler: work cancelled!
ERROR           asyncio: Exception in callback _HandlerDelegate.execute.<locals>.<lambda>(<Task cancell...do/gen.py:867>) at /Users/daves/Temporary/coros/env39/lib/python3.9/site-packages/tornado/web.py:2326
handle: <Handle _HandlerDelegate.execute.<locals>.<lambda>(<Task cancell...do/gen.py:867>) at /Users/daves/Temporary/coros/env39/lib/python3.9/site-packages/tornado/web.py:2326 created at /Users/daves/opt/lib/python3.9/asyncio/events.py:80>
source_traceback: Object created at (most recent call last):
  File "/Users/daves/Temporary/coros/tornado-testie", line 51, in <module>
    iol.start()
  File "/Users/daves/Temporary/coros/env39/lib/python3.9/site-packages/tornado/platform/asyncio.py", line 199, in start
    self.asyncio_loop.run_forever()
  File "/Users/daves/opt/lib/python3.9/asyncio/base_events.py", line 596, in run_forever
    self._run_once()
  File "/Users/daves/opt/lib/python3.9/asyncio/base_events.py", line 1882, in _run_once
    handle._run()
  File "/Users/daves/opt/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
Traceback (most recent call last):
  File "/Users/daves/opt/lib/python3.9/asyncio/tasks.py", line 651, in sleep
    return await future
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/daves/Temporary/coros/tornado-testie", line 27, in do_work
    await self.work
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/daves/Temporary/coros/env39/lib/python3.9/site-packages/tornado/web.py", line 1704, in _execute
    result = await result
  File "/Users/daves/Temporary/coros/tornado-testie", line 18, in get
    work_result, killed = await asyncio.gather(
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/daves/opt/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/Users/daves/Temporary/coros/env39/lib/python3.9/site-packages/tornado/web.py", line 2326, in <lambda>
    fut.add_done_callback(lambda f: f.result())
asyncio.exceptions.CancelledError

When I add asyncio.CancelledError to the except clause on tornado/web.py:1707 , I get the same outward behavior in both Python versions though the stack traces are different.

dave-shawley avatar Feb 11 '21 13:02 dave-shawley

Oh, that's an unpleasant surprise. Python 3.8 changed CancelledError to be a subclass of BaseException instead of Exception, so it now escapes except blocks that used to catch it. Line 1707 should probably be a bare except: instead of except Exception:. And other uses of except Exception: should probably be audited to make sure there aren't more similar bugs.

Note that Tornado's support for cancellation is pretty rudimentary - we didn't support cancellation at all until integrating with asyncio, and there is very little testing of it with tornado, so you might encounter other rough edges like this.

bdarnell avatar Mar 13 '21 22:03 bdarnell

@bdarnell that is what I had suspected. I can't say that I have encountered anything really making use of cancellation in the wild but I figured that I'd note this one in case anyone else encounters it. I had tinkered with the idea of cancelling outstanding tasks during application shut down but not much actually handles cancellation ;)

dave-shawley avatar Mar 13 '21 22:03 dave-shawley

If I'm reading the Python asyncio docs correctly, this behavior might cause issues if code uses an asyncio.TaskGroup or asyncio.timeout(): https://docs.python.org/3/library/asyncio-task.html#task-cancellation

It is recommended that coroutines use try/finally blocks to robustly perform clean-up logic. In case asyncio.CancelledError is explicitly caught, it should generally be propagated when clean-up is complete. Most code can safely ignore asyncio.CancelledError.

The asyncio components that enable structured concurrency, like asyncio.TaskGroup and asyncio.timeout(), are implemented using cancellation internally and might misbehave if a coroutine swallows asyncio.CancelledError.

neilsh avatar Mar 23 '23 00:03 neilsh

@neilsh It's possible, but I don't see a problem in that area. The paragraphs you quoted talk about the importance of re-raising CancelledError if you catch it. The change to make CancelledError a BaseException instead of a regular Exception made it harder to catch, making it less likely that you'd catch it without reraising it (which would break TaskGroup and timeout). But it introduced the opposite bug in some places, such as this place in Tornado that intended to catch CancelledError but no longer does.

bdarnell avatar Mar 23 '23 16:03 bdarnell