aiohttp
aiohttp copied to clipboard
OSError: [Errno 9] Bad file descriptor with uvloop
Note: we are waiting for https://github.com/MagicStack/uvloop/pull/646 to get fixed upstream
Describe the bug
Hello! After updating aiohttp to 3.11.13 we see new errors.
To Reproduce
I don't know how to reproduce these errors, but it seems they are somehow connected with cancellation.
Expected behavior
No errors
Logs/tracebacks
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/aiohttp/connector.py", line 1123, in _wrap_create_connection
connection = await self._loop.create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 2076, in create_connection
File "uvloop/loop.pyx", line 2066, in uvloop.loop.Loop.create_connection
asyncio.exceptions.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/cian_http/client.py", line 159, in request
async with self._get_session().request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/aiohttp/client.py", line 1425, in __aenter__
self._resp: _RetType = await self._coro
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/aiohttp/client.py", line 703, in _request
conn = await self._connector.connect(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/aiohttp/connector.py", line 548, in connect
proto = await self._create_connection(req, traces, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/aiohttp/connector.py", line 1056, in _create_connection
_, proto = await self._create_direct_connection(req, traces, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/aiohttp/connector.py", line 1380, in _create_direct_connection
transp, proto = await self._wrap_create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/aiohttp/connector.py", line 1141, in _wrap_create_connection
sock.close()
File "/usr/local/lib/python3.12/socket.py", line 505, in close
self._real_close()
File "/usr/local/lib/python3.12/socket.py", line 499, in _real_close
_ss.close(self)
OSError: [Errno 9] Bad file descriptor
Python Version
3.13.1
aiohttp Version
3.11.13
multidict Version
6.1.0
propcache Version
0.3.0
yarl Version
1.18.3
OS
Debian 12.9
Related component
Client
Additional context
No response
Code of Conduct
- [x] I agree to follow the aio-libs Code of Conduct
Also having trouble with file descriptors on the latest versions of aiohttp and aiohappyeyeballs. Not using uvloop.
BlockingIOError: [Errno 115] Operation now in progress
File "asyncio/selector_events.py", line 509, in _sock_connect
sock.connect(address)
RuntimeError: File descriptor 265 is used by transport <_SelectorSocketTransport fd=265 read=polling write=<idle, bufsize=0>>
File "usr/local/lib/python3.10/site-packages/<internal library>.py", line 53, in mpub_asyncio
async with self.session_asyncio.post(f"{self.url}/mpub", timeout=timeout, **kwargs) as r:
File "aiohttp/client.py", line 1425, in __aenter__
self._resp: _RetType = await self._coro
File "ddtrace/contrib/trace_utils_async.py", line 35, in wrapper
return await func(mod, pin, wrapped, instance, args, kwargs)
File "ddtrace/contrib/aiohttp/patch.py", line 108, in _traced_clientsession_request
resp = await func(*args, **kwargs) # type: aiohttp.ClientResponse
File "aiohttp/client.py", line 703, in _request
conn = await self._connector.connect(
File "ddtrace/contrib/aiohttp/patch.py", line 62, in connect
result = await self.__wrapped__.connect(req, *args, **kwargs)
File "aiohttp/connector.py", line 548, in connect
proto = await self._create_connection(req, traces, timeout)
File "aiohttp/connector.py", line 1056, in _create_connection
_, proto = await self._create_direct_connection(req, traces, timeout)
File "aiohttp/connector.py", line 1380, in _create_direct_connection
transp, proto = await self._wrap_create_connection(
File "aiohttp/connector.py", line 1116, in _wrap_create_connection
sock = await aiohappyeyeballs.start_connection(
File "aiohappyeyeballs/impl.py", line 93, in start_connection
raise first_exception
File "aiohappyeyeballs/impl.py", line 71, in start_connection
sock = await _connect_sock(
File "aiohappyeyeballs/impl.py", line 163, in _connect_sock
await loop.sock_connect(sock, address)
File "asyncio/selector_events.py", line 499, in sock_connect
self._sock_connect(fut, sock, address)
File "asyncio/selector_events.py", line 515, in _sock_connect
self._ensure_fd_no_transport(fd)
File "asyncio/selector_events.py", line 248, in _ensure_fd_no_transport
raise RuntimeError(
@mklokocka Are you using uvloop as well?
Not using uvloop.
Not using uvloop.
The trace back you posted has uvloop in it.
I'm talking about @mklokocka comment
@mklokocka Does the problem still happen if you disable the patching of the internals in https://github.com/DataDog/dd-trace-py/blob/main/ddtrace/contrib/internal/aiohttp/patch.py ?
The previous issues thatw ere reported focused on the aiohappyeyeballs staggered race, however the trace you have posted looks like its using the non-staggered race path https://github.com/aio-libs/aiohappyeyeballs/blob/035d976dee1f5e731852649f3fffd4e1aca21825/src/aiohappyeyeballs/impl.py#L68
Which version of aiohappyeyeballs do you have installed?
Please update to aiohappyeyeballs 2.4.8 and check if the problem goes away
@bdraco Thanks, the project ran for a few hours with the latest aiohappyeyeballs without any issues.
Thanks. If the problem reoccurs, let me know and we can reopen
@bdraco Hello! We've updated aiohappyeyeballs to 2.6.1 and it haven't helped with errors from my original message.
I've dug though the aiohappyeyeballs code again, and can find no place where we can work around this. so for uvloop, we are waiting for https://github.com/MagicStack/uvloop/pull/646 to get fixed upstream
In the mean time you might be able to get away with disabling happyeyeballs on the connector if you don't have any use cases where you have non working IPv6
https://docs.aiohttp.org/en/stable/client_reference.html#aiohttp.TCPConnector
The amount of time in seconds to wait for a connection attempt to complete, before starting the next attempt in parallel. This is the “Connection Attempt Delay” as defined in RFC 8305. To disable Happy Eyeballs, set this to None. The default value recommended by the RFC is 0.25 (250 milliseconds).
Hi @bdraco! We've tried setting happy_eyeballs_delay=None, but we're still encountering the same errors.
Disabling happy eyeballs will only reduce the chance the problem can happen with uvloop. There isn't anything we can do further but wait for uvloop to fix the issue.
In our case, all the domains we make requests to with aiohttp resolve to a single IPv4. Perhaps disabling Happy Eyeballs couldn't help us because Happy Eyeballs isn't used in the first place?
One thought keeps really bothering me – why don't we see any similar errors on aiohttp 3.9.5, but on 3.10-3.11.13, with Happy Eyeballs disabled, we do? As if something else has changed.
https://github.com/aio-libs/aiohttp/pull/10464 was probably the most relevant change in 3.11.13
One difference between how aiohappyeyeballs and that pr handles the close is that aiohappyeyeballs expects the socket.close() to be able to raise and traps it in https://github.com/aio-libs/aiohappyeyeballs/blob/e3bd5bdf44f5d187802de6dcb08d27e1ca6da048/src/aiohappyeyeballs/impl.py#L227
It probably makes sense to re-raise the OSError from the socket.close() as client_error(req.connection_key, exc) from exc
@tkukushkin Can you give https://github.com/aio-libs/aiohttp/pull/10551 a try?
@bdraco Hello!
Sorry for not answering you, you made the release faster than I was able to check it.
We still see these errors with aiohttp 3.11.16 and aiohappyeyeballs 2.6.1.
One thought keeps really bothering me – why don't we see any similar errors on aiohttp 3.9.5, but on 3.10-3.11.13, with Happy Eyeballs disabled, we do? As if something else has changed.
And this is still relevant, everything works perfectly with 3.9.5 and we have a lot of errors with any version after. Once we had 700 such errors in 1 minute from one container.
I understand that root cause of the problem is inside uvloop, but I still don't understand why aiohttp 3.9.5 works fine.
@tkukushkin The problem only happens with uvloop if we pass a socket object in. In 3.9 we called loop.create_connection directly which doesn't (or is less likely to) trigger the bug in uvloop.
3.11 https://github.com/aio-libs/aiohttp/blob/98add82d7b9eddd88b8ff60e3783413750db9274/aiohttp/connector.py#L1122
3.9 https://github.com/aio-libs/aiohttp/blob/e057906e52ed0ee457d4199b762e917733b51fdb/aiohttp/connector.py#L1025
Got it, thanks a lot! Is it possible to work around this problem somehow? Because 700 errors in one minute is too much.
I haven’t tested this, but something along these lines might help work around the issue. That said, I’m hesitant to take another stab at shipping a uvloop workaround—we’ve been burned a few too many times trying to patch it ourselves. At this point, I’d rather hold off and wait for this PR to get merged upstream.
diff --git a/aiohttp/connector.py b/aiohttp/connector.py
index 08e6ae275..46386a987 100644
--- a/aiohttp/connector.py
+++ b/aiohttp/connector.py
@@ -1123,6 +1123,15 @@ class TCPConnector(BaseConnector):
async with ceil_timeout(
timeout.sock_connect, ceil_threshold=timeout.ceil_threshold
):
+ if self._happy_eyeballs_delay is None:
+ first_addr_infos = addr_infos[0]
+ address_tuple = first_addr_infos[4]
+ host: str = address_tuple[0]
+ port: int = address_tuple[1]
+ return await self._loop.create_connection(
+ host, port, *args, **kwargs
+ )
+ else:
sock = await aiohappyeyeballs.start_connection(
addr_infos=addr_infos,
local_addr_infos=self._local_addr_infos,
@@ -1131,7 +1140,9 @@ class TCPConnector(BaseConnector):
loop=self._loop,
socket_factory=self._socket_factory,
)
- return await self._loop.create_connection(*args, **kwargs, sock=sock)
+ return await self._loop.create_connection(
+ *args, **kwargs, sock=sock
+ )
except cert_errors as exc:
raise ClientConnectorCertificateError(req.connection_key, exc) from exc
except ssl_errors as exc:
@@ -1324,7 +1335,10 @@ class TCPConnector(BaseConnector):
)
except (ClientConnectorError, asyncio.TimeoutError) as exc:
last_exc = exc
- aiohappyeyeballs.pop_addr_infos_interleave(addr_infos, self._interleave)
+ aiohappyeyeballs.pop_addr_infos_interleave(
+ addr_infos,
+ None if self._happy_eyeballs_delay is None else self._interleave,
+ )
continue
if req.is_ssl() and fingerprint:
I would really like to test this workaround, but I need wheels. Maybe it's possible to make a test release (like a release candidate), so it will not be delivered to people who will not request this specific test version? IMHO it could be a good approach to test other fixes in future as well.
It would take me a couple of hours to put something like that together, since I’d need to adjust the existing tests. I’ve already spent a few weeks trying to work around this issue in uvloop, so it’s a bit discouraging to keep pushing on it—especially when the proposed fix hasn’t gotten any attention upstream.
Thank you so much for your efforts trying to work around this issue on the aiohttp side. but unfortunately, the fix in uvloop has not been moving anywhere for six months, so I believe some work around is still needed.
I'll leave this open, and when I have some cycles, I'll take a look at it again.
Also, in the meantime, I added a note to the top of this issue and pinned it, in the hopes that someone has a relationship with the uvloop maintainers, who will see this issue, and might be able to help get the fix moving upstream.
Also, since you're using Python 3.13, have you considered disabling uvloop?
https://github.com/MagicStack/uvloop/issues/566#issuecomment-2424812498
Yes, I've tried to disable uvloop, but the performance impact was significant enough in our case.
I have three ideas at the moment:
- build uvloop wheels with fix and publish it to our local pypi
- try to monkeypatch aiohttp with your new workaround.
- pin aiohttp to 3.9.5
One and two are probably decent strategies, I wouldn't do three because you're gonna end up in a panic the next time there's some type of vulnerability that has to be fixed.
Yes, I agree, and we already need some fixes from newer releases in one of our services.