aiohttp
aiohttp copied to clipboard
Timeout on one connection can cause CancelledError on other connections on DNS refresh
I ran into this on 2.3, but based on code it looks like this issue exists in current version
Conditions:
- DNS entry expires
- One task starts updating the DNS entry (and gets the lock)
- Another task sees the expired cached entry and waits for first one
- First task times out, causing async-timeout to cancel the task. This causes a CancelledError at this level. This is then passed on the lock and then raised to the second task
While the async-timeout for first task knows about the CancelledError and handles it (by throwing TimeoutError), the second task throws back CancelledError
DNS refresh code:
https://github.com/aio-libs/aiohttp/blob/56e39cd1e7cac90711805baaa89bb52842d0aaed/aiohttp/connector.py#L785-L816
Locks code:
https://github.com/aio-libs/aiohttp/blob/56e39cd1e7cac90711805baaa89bb52842d0aaed/aiohttp/locks.py#L29-L38
Example stack trace:
File "/Users/siddh/lib/python3.4/site-packages/aiohttp/helpers.py", line 99, in __iter__
ret = yield from self._coro
File "/Users/siddh/lib/python3.4/site-packages/aiohttp/client.py", line 267, in _request
conn = yield from self._connector.connect(req)
File "/Users/siddh/lib/python3.4/site-packages/aiohttp/connector.py", line 402, in connect
proto = yield from self._create_connection(req)
File "/Users/siddh/lib/python3.4/site-packages/aiohttp/connector.py", line 749, in _create_connection
_, proto = yield from self._create_direct_connection(req)
File "/Users/siddh/lib/python3.4/site-packages/aiohttp/connector.py", line 813, in _create_direct_connection
hosts = yield from self._resolve_host(req.url.raw_host, req.port)
File "/Users/siddh/lib/python3.4/site-packages/aiohttp/connector.py", line 718, in _resolve_host
yield from self._throttle_dns_events[key].wait()
File "/Users/siddh/lib/python3.4/site-packages/aiohttp/locks.py", line 34, in wait
raise self._exc
File "/Users/siddh/lib/python3.4/site-packages/aiohttp/client.py", line 267, in _request
conn = yield from self._connector.connect(req)
File "/Users/siddh/lib/python3.4/site-packages/aiohttp/connector.py", line 402, in connect
proto = yield from self._create_connection(req)
File "/Users/siddh/lib/python3.4/site-packages/aiohttp/connector.py", line 749, in _create_connection
_, proto = yield from self._create_direct_connection(req)
File "/Users/siddh/lib/python3.4/site-packages/aiohttp/connector.py", line 813, in _create_direct_connection
hosts = yield from self._resolve_host(req.url.raw_host, req.port)
File "/Users/siddh/lib/python3.4/site-packages/aiohttp/connector.py", line 727, in _resolve_host
loop=self._loop)
File "/Users/siddh/python3.4/lib/python3.4/asyncio/futures.py", line 358, in __iter__
yield self # This tells Task to wait for completion./Users/siddh
File "/Users/siddh/python3.4/lib/python3.4/asyncio/tasks.py", line 297, in _wakeup
future.result()
File "/Users/siddh/python3.4/lib/python3.4/asyncio/futures.py", line 266, in result
raise CancelledError
GitMate.io thinks the contributor most likely able to help you is @asvetlov.
Possibly related issues are https://github.com/aio-libs/aiohttp/issues/930 (aiohttp.Timeout causes CancelledError), https://github.com/aio-libs/aiohttp/issues/1799 (Unclosed connection), https://github.com/aio-libs/aiohttp/issues/1883 (ProcessPoolExecutor in background cause freeze after keep-alive timeout), https://github.com/aio-libs/aiohttp/issues/1463 (session get timeout and tcp connection limit don't work together), and https://github.com/aio-libs/aiohttp/issues/2374 (cancellederror still exist.).
@si2dharth that version of aiohttp and Python 3.4 ain't supported. Can you reproduce this on the modern aiohttp (master or 3.5) under Python >= 3.5.3?
Hmm.
Interesting, thank you @si2dharth
I heard blames like this several times.
I pretty sure that async_timeout works pretty well.
You've pointed on DNS cached resolver and locks.py, it can be a source of the problem.
I need to dig into the code.
This really needs a test to reproduce. The code doesn't appear to have changed much, so it might still be an issue. But, without a test, it's unlikely to get looked at.
I added cancellation coverage in https://github.com/aio-libs/aiohttp/pull/9454 so I'm pretty sure this can't happen anymore. If you can still see a problem, we will need a reproducer.
Thanks, I agree. I had the same instinct. I was trying to repro it with old dependencies to show it is fixed now, but wasn't able to get to it.