aiohttp icon indicating copy to clipboard operation
aiohttp copied to clipboard

Timeout on one connection can cause CancelledError on other connections on DNS refresh

Open si2dharth opened this issue 7 years ago • 4 comments

I ran into this on 2.3, but based on code it looks like this issue exists in current version

Conditions:

  • DNS entry expires
  • One task starts updating the DNS entry (and gets the lock)
  • Another task sees the expired cached entry and waits for first one
  • First task times out, causing async-timeout to cancel the task. This causes a CancelledError at this level. This is then passed on the lock and then raised to the second task

While the async-timeout for first task knows about the CancelledError and handles it (by throwing TimeoutError), the second task throws back CancelledError

DNS refresh code:

https://github.com/aio-libs/aiohttp/blob/56e39cd1e7cac90711805baaa89bb52842d0aaed/aiohttp/connector.py#L785-L816

Locks code:

https://github.com/aio-libs/aiohttp/blob/56e39cd1e7cac90711805baaa89bb52842d0aaed/aiohttp/locks.py#L29-L38

Example stack trace:

File "/Users/siddh/lib/python3.4/site-packages/aiohttp/helpers.py", line 99, in __iter__
ret = yield from self._coro
File "/Users/siddh/lib/python3.4/site-packages/aiohttp/client.py", line 267, in _request
conn = yield from self._connector.connect(req)
File "/Users/siddh/lib/python3.4/site-packages/aiohttp/connector.py", line 402, in connect
proto = yield from self._create_connection(req)
File "/Users/siddh/lib/python3.4/site-packages/aiohttp/connector.py", line 749, in _create_connection
_, proto = yield from self._create_direct_connection(req)
File "/Users/siddh/lib/python3.4/site-packages/aiohttp/connector.py", line 813, in _create_direct_connection
hosts = yield from self._resolve_host(req.url.raw_host, req.port)
File "/Users/siddh/lib/python3.4/site-packages/aiohttp/connector.py", line 718, in _resolve_host
yield from self._throttle_dns_events[key].wait()
File "/Users/siddh/lib/python3.4/site-packages/aiohttp/locks.py", line 34, in wait
raise self._exc
File "/Users/siddh/lib/python3.4/site-packages/aiohttp/client.py", line 267, in _request
conn = yield from self._connector.connect(req)
File "/Users/siddh/lib/python3.4/site-packages/aiohttp/connector.py", line 402, in connect
proto = yield from self._create_connection(req)
File "/Users/siddh/lib/python3.4/site-packages/aiohttp/connector.py", line 749, in _create_connection
_, proto = yield from self._create_direct_connection(req)
File "/Users/siddh/lib/python3.4/site-packages/aiohttp/connector.py", line 813, in _create_direct_connection
hosts = yield from self._resolve_host(req.url.raw_host, req.port)
File "/Users/siddh/lib/python3.4/site-packages/aiohttp/connector.py", line 727, in _resolve_host
loop=self._loop)
File "/Users/siddh/python3.4/lib/python3.4/asyncio/futures.py", line 358, in __iter__
yield self # This tells Task to wait for completion./Users/siddh
File "/Users/siddh/python3.4/lib/python3.4/asyncio/tasks.py", line 297, in _wakeup
future.result()
File "/Users/siddh/python3.4/lib/python3.4/asyncio/futures.py", line 266, in result
raise CancelledError

si2dharth avatar Dec 06 '18 07:12 si2dharth

GitMate.io thinks the contributor most likely able to help you is @asvetlov.

Possibly related issues are https://github.com/aio-libs/aiohttp/issues/930 (aiohttp.Timeout causes CancelledError), https://github.com/aio-libs/aiohttp/issues/1799 (Unclosed connection), https://github.com/aio-libs/aiohttp/issues/1883 (ProcessPoolExecutor in background cause freeze after keep-alive timeout), https://github.com/aio-libs/aiohttp/issues/1463 (session get timeout and tcp connection limit don't work together), and https://github.com/aio-libs/aiohttp/issues/2374 (cancellederror still exist.).

aio-libs-bot avatar Dec 06 '18 07:12 aio-libs-bot

@si2dharth that version of aiohttp and Python 3.4 ain't supported. Can you reproduce this on the modern aiohttp (master or 3.5) under Python >= 3.5.3?

webknjaz avatar Dec 06 '18 08:12 webknjaz

Hmm. Interesting, thank you @si2dharth I heard blames like this several times. I pretty sure that async_timeout works pretty well. You've pointed on DNS cached resolver and locks.py, it can be a source of the problem. I need to dig into the code.

asvetlov avatar Dec 06 '18 18:12 asvetlov

This really needs a test to reproduce. The code doesn't appear to have changed much, so it might still be an issue. But, without a test, it's unlikely to get looked at.

Dreamsorcerer avatar Aug 10 '24 21:08 Dreamsorcerer

I added cancellation coverage in https://github.com/aio-libs/aiohttp/pull/9454 so I'm pretty sure this can't happen anymore. If you can still see a problem, we will need a reproducer.

bdraco avatar Oct 10 '24 17:10 bdraco

Thanks, I agree. I had the same instinct. I was trying to repro it with old dependencies to show it is fixed now, but wasn't able to get to it.

si2dharth avatar Oct 11 '24 19:10 si2dharth