aiohttp icon indicating copy to clipboard operation
aiohttp copied to clipboard

OSError: [Errno 9] Bad file descriptor with uvloop

Open tkukushkin opened this issue 9 months ago • 36 comments

Note: we are waiting for https://github.com/MagicStack/uvloop/pull/646 to get fixed upstream

Describe the bug

Hello! After updating aiohttp to 3.11.13 we see new errors.

To Reproduce

I don't know how to reproduce these errors, but it seems they are somehow connected with cancellation.

Expected behavior

No errors

Logs/tracebacks

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/aiohttp/connector.py", line 1123, in _wrap_create_connection
    connection = await self._loop.create_connection(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "uvloop/loop.pyx", line 2076, in create_connection
  File "uvloop/loop.pyx", line 2066, in uvloop.loop.Loop.create_connection
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/cian_http/client.py", line 159, in request
    async with self._get_session().request(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiohttp/client.py", line 1425, in __aenter__
    self._resp: _RetType = await self._coro
                           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiohttp/client.py", line 703, in _request
    conn = await self._connector.connect(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiohttp/connector.py", line 548, in connect
    proto = await self._create_connection(req, traces, timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiohttp/connector.py", line 1056, in _create_connection
    _, proto = await self._create_direct_connection(req, traces, timeout)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiohttp/connector.py", line 1380, in _create_direct_connection
    transp, proto = await self._wrap_create_connection(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiohttp/connector.py", line 1141, in _wrap_create_connection
    sock.close()
  File "/usr/local/lib/python3.12/socket.py", line 505, in close
    self._real_close()
  File "/usr/local/lib/python3.12/socket.py", line 499, in _real_close
    _ss.close(self)
OSError: [Errno 9] Bad file descriptor

Python Version

3.13.1

aiohttp Version

3.11.13

multidict Version

6.1.0

propcache Version

0.3.0

yarl Version

1.18.3

OS

Debian 12.9

Related component

Client

Additional context

No response

Code of Conduct

  • [x] I agree to follow the aio-libs Code of Conduct

tkukushkin avatar Mar 02 '25 20:03 tkukushkin

Also having trouble with file descriptors on the latest versions of aiohttp and aiohappyeyeballs. Not using uvloop.

BlockingIOError: [Errno 115] Operation now in progress
  File "asyncio/selector_events.py", line 509, in _sock_connect
    sock.connect(address)

RuntimeError: File descriptor 265 is used by transport <_SelectorSocketTransport fd=265 read=polling write=<idle, bufsize=0>>
  File "usr/local/lib/python3.10/site-packages/<internal library>.py", line 53, in mpub_asyncio
    async with self.session_asyncio.post(f"{self.url}/mpub", timeout=timeout, **kwargs) as r:
  File "aiohttp/client.py", line 1425, in __aenter__
    self._resp: _RetType = await self._coro
  File "ddtrace/contrib/trace_utils_async.py", line 35, in wrapper
    return await func(mod, pin, wrapped, instance, args, kwargs)
  File "ddtrace/contrib/aiohttp/patch.py", line 108, in _traced_clientsession_request
    resp = await func(*args, **kwargs)  # type: aiohttp.ClientResponse
  File "aiohttp/client.py", line 703, in _request
    conn = await self._connector.connect(
  File "ddtrace/contrib/aiohttp/patch.py", line 62, in connect
    result = await self.__wrapped__.connect(req, *args, **kwargs)
  File "aiohttp/connector.py", line 548, in connect
    proto = await self._create_connection(req, traces, timeout)
  File "aiohttp/connector.py", line 1056, in _create_connection
    _, proto = await self._create_direct_connection(req, traces, timeout)
  File "aiohttp/connector.py", line 1380, in _create_direct_connection
    transp, proto = await self._wrap_create_connection(
  File "aiohttp/connector.py", line 1116, in _wrap_create_connection
    sock = await aiohappyeyeballs.start_connection(
  File "aiohappyeyeballs/impl.py", line 93, in start_connection
    raise first_exception
  File "aiohappyeyeballs/impl.py", line 71, in start_connection
    sock = await _connect_sock(
  File "aiohappyeyeballs/impl.py", line 163, in _connect_sock
    await loop.sock_connect(sock, address)
  File "asyncio/selector_events.py", line 499, in sock_connect
    self._sock_connect(fut, sock, address)
  File "asyncio/selector_events.py", line 515, in _sock_connect
    self._ensure_fd_no_transport(fd)
  File "asyncio/selector_events.py", line 248, in _ensure_fd_no_transport
    raise RuntimeError(

mklokocka avatar Mar 03 '25 17:03 mklokocka

@mklokocka Are you using uvloop as well?

bdraco avatar Mar 03 '25 20:03 bdraco

Not using uvloop.

tkukushkin avatar Mar 03 '25 21:03 tkukushkin

Not using uvloop.

The trace back you posted has uvloop in it.

bdraco avatar Mar 03 '25 22:03 bdraco

I'm talking about @mklokocka comment

tkukushkin avatar Mar 03 '25 22:03 tkukushkin

@mklokocka Does the problem still happen if you disable the patching of the internals in https://github.com/DataDog/dd-trace-py/blob/main/ddtrace/contrib/internal/aiohttp/patch.py ?

The previous issues thatw ere reported focused on the aiohappyeyeballs staggered race, however the trace you have posted looks like its using the non-staggered race path https://github.com/aio-libs/aiohappyeyeballs/blob/035d976dee1f5e731852649f3fffd4e1aca21825/src/aiohappyeyeballs/impl.py#L68

Which version of aiohappyeyeballs do you have installed?

bdraco avatar Mar 03 '25 22:03 bdraco

Please update to aiohappyeyeballs 2.4.8 and check if the problem goes away

bdraco avatar Mar 04 '25 03:03 bdraco

@bdraco Thanks, the project ran for a few hours with the latest aiohappyeyeballs without any issues.

mklokocka avatar Mar 04 '25 21:03 mklokocka

Thanks. If the problem reoccurs, let me know and we can reopen

bdraco avatar Mar 04 '25 21:03 bdraco

@bdraco Hello! We've updated aiohappyeyeballs to 2.6.1 and it haven't helped with errors from my original message.

tkukushkin avatar Mar 12 '25 12:03 tkukushkin

I've dug though the aiohappyeyeballs code again, and can find no place where we can work around this. so for uvloop, we are waiting for https://github.com/MagicStack/uvloop/pull/646 to get fixed upstream

In the mean time you might be able to get away with disabling happyeyeballs on the connector if you don't have any use cases where you have non working IPv6

https://docs.aiohttp.org/en/stable/client_reference.html#aiohttp.TCPConnector

The amount of time in seconds to wait for a connection attempt to complete, before starting the next attempt in parallel. This is the “Connection Attempt Delay” as defined in RFC 8305. To disable Happy Eyeballs, set this to None. The default value recommended by the RFC is 0.25 (250 milliseconds).

bdraco avatar Mar 12 '25 18:03 bdraco

Hi @bdraco! We've tried setting happy_eyeballs_delay=None, but we're still encountering the same errors.

tkukushkin avatar Mar 14 '25 10:03 tkukushkin

Disabling happy eyeballs will only reduce the chance the problem can happen with uvloop. There isn't anything we can do further but wait for uvloop to fix the issue.

bdraco avatar Mar 14 '25 17:03 bdraco

In our case, all the domains we make requests to with aiohttp resolve to a single IPv4. Perhaps disabling Happy Eyeballs couldn't help us because Happy Eyeballs isn't used in the first place?

One thought keeps really bothering me – why don't we see any similar errors on aiohttp 3.9.5, but on 3.10-3.11.13, with Happy Eyeballs disabled, we do? As if something else has changed.

tkukushkin avatar Mar 14 '25 18:03 tkukushkin

https://github.com/aio-libs/aiohttp/pull/10464 was probably the most relevant change in 3.11.13

bdraco avatar Mar 14 '25 23:03 bdraco

One difference between how aiohappyeyeballs and that pr handles the close is that aiohappyeyeballs expects the socket.close() to be able to raise and traps it in https://github.com/aio-libs/aiohappyeyeballs/blob/e3bd5bdf44f5d187802de6dcb08d27e1ca6da048/src/aiohappyeyeballs/impl.py#L227

It probably makes sense to re-raise the OSError from the socket.close() as client_error(req.connection_key, exc) from exc

bdraco avatar Mar 14 '25 23:03 bdraco

@tkukushkin Can you give https://github.com/aio-libs/aiohttp/pull/10551 a try?

bdraco avatar Mar 14 '25 23:03 bdraco

@bdraco Hello!

Sorry for not answering you, you made the release faster than I was able to check it.

We still see these errors with aiohttp 3.11.16 and aiohappyeyeballs 2.6.1.

One thought keeps really bothering me – why don't we see any similar errors on aiohttp 3.9.5, but on 3.10-3.11.13, with Happy Eyeballs disabled, we do? As if something else has changed.

And this is still relevant, everything works perfectly with 3.9.5 and we have a lot of errors with any version after. Once we had 700 such errors in 1 minute from one container.

I understand that root cause of the problem is inside uvloop, but I still don't understand why aiohttp 3.9.5 works fine.

tkukushkin avatar Apr 15 '25 12:04 tkukushkin

@tkukushkin The problem only happens with uvloop if we pass a socket object in. In 3.9 we called loop.create_connection directly which doesn't (or is less likely to) trigger the bug in uvloop.

3.11 https://github.com/aio-libs/aiohttp/blob/98add82d7b9eddd88b8ff60e3783413750db9274/aiohttp/connector.py#L1122

3.9 https://github.com/aio-libs/aiohttp/blob/e057906e52ed0ee457d4199b762e917733b51fdb/aiohttp/connector.py#L1025

bdraco avatar Apr 15 '25 17:04 bdraco

Got it, thanks a lot! Is it possible to work around this problem somehow? Because 700 errors in one minute is too much.

tkukushkin avatar Apr 15 '25 17:04 tkukushkin

I haven’t tested this, but something along these lines might help work around the issue. That said, I’m hesitant to take another stab at shipping a uvloop workaround—we’ve been burned a few too many times trying to patch it ourselves. At this point, I’d rather hold off and wait for this PR to get merged upstream.

diff --git a/aiohttp/connector.py b/aiohttp/connector.py
index 08e6ae275..46386a987 100644
--- a/aiohttp/connector.py
+++ b/aiohttp/connector.py
@@ -1123,6 +1123,15 @@ class TCPConnector(BaseConnector):
             async with ceil_timeout(
                 timeout.sock_connect, ceil_threshold=timeout.ceil_threshold
             ):
+                if self._happy_eyeballs_delay is None:
+                    first_addr_infos = addr_infos[0]
+                    address_tuple = first_addr_infos[4]
+                    host: str = address_tuple[0]
+                    port: int = address_tuple[1]
+                    return await self._loop.create_connection(
+                        host, port, *args, **kwargs
+                    )
+                else:
                     sock = await aiohappyeyeballs.start_connection(
                         addr_infos=addr_infos,
                         local_addr_infos=self._local_addr_infos,
@@ -1131,7 +1140,9 @@ class TCPConnector(BaseConnector):
                         loop=self._loop,
                         socket_factory=self._socket_factory,
                     )
-                return await self._loop.create_connection(*args, **kwargs, sock=sock)
+                    return await self._loop.create_connection(
+                        *args, **kwargs, sock=sock
+                    )
         except cert_errors as exc:
             raise ClientConnectorCertificateError(req.connection_key, exc) from exc
         except ssl_errors as exc:
@@ -1324,7 +1335,10 @@ class TCPConnector(BaseConnector):
                 )
             except (ClientConnectorError, asyncio.TimeoutError) as exc:
                 last_exc = exc
-                aiohappyeyeballs.pop_addr_infos_interleave(addr_infos, self._interleave)
+                aiohappyeyeballs.pop_addr_infos_interleave(
+                    addr_infos,
+                    None if self._happy_eyeballs_delay is None else self._interleave,
+                )
                 continue
 
             if req.is_ssl() and fingerprint:

bdraco avatar Apr 15 '25 18:04 bdraco

I would really like to test this workaround, but I need wheels. Maybe it's possible to make a test release (like a release candidate), so it will not be delivered to people who will not request this specific test version? IMHO it could be a good approach to test other fixes in future as well.

tkukushkin avatar Apr 15 '25 18:04 tkukushkin

It would take me a couple of hours to put something like that together, since I’d need to adjust the existing tests. I’ve already spent a few weeks trying to work around this issue in uvloop, so it’s a bit discouraging to keep pushing on it—especially when the proposed fix hasn’t gotten any attention upstream.

bdraco avatar Apr 15 '25 18:04 bdraco

Thank you so much for your efforts trying to work around this issue on the aiohttp side. but unfortunately, the fix in uvloop has not been moving anywhere for six months, so I believe some work around is still needed.

tkukushkin avatar Apr 15 '25 18:04 tkukushkin

I'll leave this open, and when I have some cycles, I'll take a look at it again.

bdraco avatar Apr 15 '25 18:04 bdraco

Also, in the meantime, I added a note to the top of this issue and pinned it, in the hopes that someone has a relationship with the uvloop maintainers, who will see this issue, and might be able to help get the fix moving upstream.

bdraco avatar Apr 15 '25 18:04 bdraco

Also, since you're using Python 3.13, have you considered disabling uvloop?

https://github.com/MagicStack/uvloop/issues/566#issuecomment-2424812498

bdraco avatar Apr 15 '25 18:04 bdraco

Yes, I've tried to disable uvloop, but the performance impact was significant enough in our case.

I have three ideas at the moment:

  1. build uvloop wheels with fix and publish it to our local pypi
  2. try to monkeypatch aiohttp with your new workaround.
  3. pin aiohttp to 3.9.5

tkukushkin avatar Apr 15 '25 18:04 tkukushkin

One and two are probably decent strategies, I wouldn't do three because you're gonna end up in a panic the next time there's some type of vulnerability that has to be fixed.

bdraco avatar Apr 15 '25 19:04 bdraco

Yes, I agree, and we already need some fixes from newer releases in one of our services.

tkukushkin avatar Apr 15 '25 19:04 tkukushkin