aiohttp icon indicating copy to clipboard operation
aiohttp copied to clipboard

(client) connections cannot be really closed before EOF with uvloop enabled

Open Rongronggg9 opened this issue 2 years ago • 18 comments

Describe the bug

If uvloop is enabled, as long as EOF is not reached, even if the response, connection, and session are all closed, the server from which aiohttp requested resources, will still keep sending packets to the aiohttp-client for some time. However, the bug is only reproducible with uvloop enabled.

To Reproduce

aiohttp[speedups]==3.8.1
uvloop==0.16.0
import asyncio
import aiohttp
import uvloop
import logging

logging.basicConfig(level=logging.DEBUG)


async def fetch(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            logging.debug(response)
            logging.debug(response.content)


async def reproduce(wait):
    await fetch('https://sgp-ping.vultr.com/vultr.com.1000MB.bin')
    await asyncio.sleep(wait)  # now you will see that the file is still being downloaded


def main(enable_uvloop):
    if enable_uvloop:
        logging.info('********** Using uvloop **********')
        uvloop.install()  # the bug is only reproducible with uvloop
    else:
        logging.info('********** Using asyncio **********')
    loop = asyncio.new_event_loop()
    loop.set_debug(True)  # or set env PYTHONASYNCIODEBUG=1, not necessary to reproduce the bug
    asyncio.set_event_loop(loop)
    loop.run_until_complete(reproduce(15 if enable_uvloop else 5))
    loop.close()


if __name__ == '__main__':
    main(enable_uvloop=False)
    main(enable_uvloop=True)
    input('Press Enter to exit')

Expected behavior

The connection should be closed and no packet from the server could be sent to the client anymore, no matter with or without uvloop enabled, no matter reaching or not reaching EOF.

Logs/tracebacks

INFO:root:********** Using asyncio **********
DEBUG:asyncio:Using selector: EpollSelector
DEBUG:asyncio:Get address info sgp-ping.vultr.com:443, type=<SocketKind.SOCK_STREAM: 1>, flags=<AddressInfo.AI_ADDRCONFIG: 32>
INFO:asyncio:Getting address info sgp-ping.vultr.com:443, type=<SocketKind.SOCK_STREAM: 1>, flags=<AddressInfo.AI_ADDRCONFIG: 32> took 209.423ms: [(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('45.32.100.168', 443))]
DEBUG:asyncio:<asyncio.sslproto.SSLProtocol object at 0x7fc80064dfa0> starts SSL handshake
DEBUG:asyncio:<asyncio.sslproto.SSLProtocol object at 0x7fc80064dfa0>: SSL handshake took 225.2 ms
DEBUG:asyncio:<asyncio.TransportSocket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.1.101', 51708), raddr=('45.32.100.168', 443)> connected to 45.32.100.168:443: (<asyncio.sslproto._SSLProtocolTransport object at 0x7fc7ff9b5a00>, <aiohttp.client_proto.ResponseHandler object at 0x7fc800205b20>)
DEBUG:root:<ClientResponse(https://sgp-ping.vultr.com/vultr.com.1000MB.bin) [200 OK]>
<CIMultiDictProxy('Server': 'nginx', 'Date': 'Sat, 21 May 2022 00:29:55 GMT', 'Content-Type': 'application/octet-stream', 'Content-Length': '1048576000', 'Last-Modified': 'Mon, 20 Sep 2021 19:54:01 GMT', 'Connection': 'keep-alive', 'Etag': '"6148e6d9-3e800000"', 'Expires': 'Sun, 22 May 2022 00:29:55 GMT', 'Cache-Control': 'max-age=86400', 'X-Frame-Options': 'DENY', 'X-Content-Type-Options': 'nosniff', 'Access-Control-Allow-Origin': '*', 'Accept-Ranges': 'bytes')>

DEBUG:root:<StreamReader 15968 bytes>
DEBUG:asyncio:<asyncio.sslproto.SSLProtocol object at 0x7fc80064dfa0>: SSL error in data received
Traceback (most recent call last):
  File "/usr/lib/python3.9/asyncio/sslproto.py", line 534, in data_received
    ssldata, appdata = self._sslpipe.feed_ssldata(data)
  File "/usr/lib/python3.9/asyncio/sslproto.py", line 206, in feed_ssldata
    self._sslobj.unwrap()
  File "/usr/lib/python3.9/ssl.py", line 948, in unwrap
    return self._sslobj.shutdown()
ssl.SSLError: [SSL: APPLICATION_DATA_AFTER_CLOSE_NOTIFY] application data after close notify (_ssl.c:2756)
DEBUG:asyncio:Close <_UnixSelectorEventLoop running=False closed=False debug=True>
INFO:root:********** Using uvloop **********
DEBUG:asyncio:<uvloop.loop.SSLProtocol object at 0x7fc8040f8b40> starts SSL handshake
DEBUG:asyncio:<uvloop.loop.SSLProtocol object at 0x7fc8040f8b40>: SSL handshake took 227.0 ms
DEBUG:root:<ClientResponse(https://sgp-ping.vultr.com/vultr.com.1000MB.bin) [200 OK]>
<CIMultiDictProxy('Server': 'nginx', 'Date': 'Sat, 21 May 2022 00:30:01 GMT', 'Content-Type': 'application/octet-stream', 'Content-Length': '1048576000', 'Last-Modified': 'Mon, 20 Sep 2021 19:54:01 GMT', 'Connection': 'keep-alive', 'Etag': '"6148e6d9-3e800000"', 'Expires': 'Sun, 22 May 2022 00:30:01 GMT', 'Cache-Control': 'max-age=86400', 'X-Frame-Options': 'DENY', 'X-Content-Type-Options': 'nosniff', 'Access-Control-Allow-Origin': '*', 'Accept-Ranges': 'bytes')>

DEBUG:root:<StreamReader 15968 bytes>
Press Enter to exit

Python Version

Python 3.9.12 (main, Mar 24 2022, 13:02:21) [GCC 11.2.0] on linux

aiohttp Version

Name: aiohttp
Version: 3.8.1
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: 
Author-email: 
License: Apache 2
Location: /home/rongrong/venv/RSStT/lib/python3.9/site-packages
Requires: aiosignal, async-timeout, attrs, charset-normalizer, frozenlist, multidict, yarl
Required-by: aiohttp-retry, aiohttp-socks

multidict Version

Name: multidict
Version: 6.0.2
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache 2
Location: /home/rongrong/venv/RSStT/lib/python3.9/site-packages
Requires: 
Required-by: aiohttp, yarl

yarl Version

Name: yarl
Version: 1.7.2
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl/
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache 2
Location: /home/rongrong/venv/RSStT/lib/python3.9/site-packages
Requires: idna, multidict
Required-by: aiohttp

OS

       _,met$$$$$gg.          rongrong@envy
    ,g$$$$$$$$$$$$$$$P.       -------------
  ,g$$P"     """Y$$.".        OS: Debian GNU/Linux bookworm/sid x86_64
 ,$$P'              `$$$.     Host: HP ENVY x360 Convertible 15-dr1xxx
',$$P       ,ggs.     `$$b:   Kernel: 5.17.0-1-amd64
`d$$'     ,$P"'   .    $$$    Uptime: 7 hours, 43 mins
 $$P      d$'     ,    $$P    Packages: 3453 (dpkg)
 $$:      $$.   -    ,d$$'    Shell: bash 5.1.16
 $$;      Y$b._   _,d$P'      Resolution: 1080x1920, 1920x1080
 Y$$.    `.`"Y$$$$P"'         DE: Plasma 5.24.5
 `$$b      "-.__              WM: KWin
  `Y$$                        WM Theme: Breeze 微风
   `Y$$.                      Theme: Breeze Light [Plasma], Breeze [GTK2/3]
     `$$b.                    Icons: breeze-dark [Plasma], breeze-dark [GTK2/3]
       `Y$$b.                 Terminal: konsole
          `"Y$b._             CPU: Intel i7-10510U (8) @ 4.900GHz
              `"""            GPU: NVIDIA GeForce MX250
                              GPU: Intel CometLake-U GT2 [UHD Graphics]
                              Memory: 16156MiB / 31774MiB

Related component

Client

Additional context

The bug has also been reported to uvloop: https://github.com/MagicStack/uvloop/issues/471

Code of Conduct

  • [X] I agree to follow the aio-libs Code of Conduct

Rongronggg9 avatar May 21 '22 00:05 Rongronggg9

Some statistics in my production environment. (In my use case only the response headers and/or the beginning 4KB of the response body is needed.)

Rongronggg9 avatar May 22 '22 01:05 Rongronggg9

That probably explains some of the frequent test failures with uvloop. Would be great if someone figures out the fix.

Dreamsorcerer avatar May 22 '22 13:05 Dreamsorcerer

I'd done some deep-dive and found that the bug is a TLS/SSL-relevant bug. Non-TLS connections can be closed even if uvloop is enabled.

-    await fetch('https://sgp-ping.vultr.com/vultr.com.1000MB.bin')
+    await fetch('http://sgp-ping.vultr.com/vultr.com.1000MB.bin')

Rongronggg9 avatar May 22 '22 14:05 Rongronggg9

I also analyzed the web traffic with Wireshark. With asyncio, RST is immediately sent once the connection was closed; but with uvloop, closing the connection won't change anything as if is it not closed, RST is only sent after the event loop is closed.

Rongronggg9 avatar May 22 '22 14:05 Rongronggg9

Another deep-dive:

async def fetch(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            logging.debug(response)
            logging.debug(response.content)
+            await asyncio.sleep(5)  # at this point, only a little part of the response is read, then pends
+            return  # after the connection is "closed", uvloop will resume reading the response

Rongronggg9 avatar May 22 '22 14:05 Rongronggg9

A deeper dive shows the direct cause of the bug.

async def reproduce(wait):
    await fetch('https://sgp-ping.vultr.com/vultr.com.1000MB.bin')
-    await asyncio.sleep(wait)  # now you will see that the file is still being downloaded
+    await asyncio.sleep(35)  # the download will last for exactly 30s, then abort

aiohttp closing its high-level connection causes uvloop to call _SSLProtocolTransport.close().

https://github.com/MagicStack/uvloop/blob/3e71ddc3383a8e0546519117c8775323c19eb6d7/uvloop/sslproto.pyx#L42-L51

     def close(self): 
         """Close the transport. 
  
         Buffered data will be flushed asynchronously.  No more data 
         will be received.  After all buffered data is flushed, the 
         protocol's connection_lost() method will (eventually) called 
         with None as its argument. 
         """ 
         self._closed = True 
         self._ssl_protocol._start_shutdown(self.context.copy()) 

https://github.com/MagicStack/uvloop/blob/3e71ddc3383a8e0546519117c8775323c19eb6d7/uvloop/sslproto.pyx#L559-L574

 cdef _start_shutdown(self, object context=None): 
     if self._state in (FLUSHING, SHUTDOWN, UNWRAPPED): 
         return 
     # we don't need the context for _abort or the timeout, because 
     # TCP transport._force_close() should be able to call 
     # connection_lost() in the right context 
     if self._app_transport is not None: 
         self._app_transport._closed = True 
     if self._state == DO_HANDSHAKE: 
         self._abort(None) 
     else: 
         self._set_state(FLUSHING) 
         self._shutdown_timeout_handle = \
             self._loop.call_later(self._ssl_shutdown_timeout, 
                                   lambda: self._check_shutdown_timeout()) 
         self._do_flush(context) 

We can easily verify that the low-level connection is closed 30 seconds after aiohttp have "closed" its high-level connection (as long as the response body is big enough). This complies with SSL_SHUTDOWN_TIMEOUT. That is to say, it is SSLProtocol._ssl_shutdown_timeout() that eventually closes the connection.

https://github.com/MagicStack/uvloop/blob/3e71ddc3383a8e0546519117c8775323c19eb6d7/uvloop/includes/consts.pxi#L24

 DEF SSL_SHUTDOWN_TIMEOUT = 30.0 

Now that SSLProtocol._ssl_shutdown_timeout() is called, the conclusion is evident that SSLProtocol._do_flush() causes the reading to be resumed but meanwhile is stuck in something until EOF or timeout is reached, causing the reading sustained.

https://github.com/MagicStack/uvloop/blob/3e71ddc3383a8e0546519117c8775323c19eb6d7/uvloop/sslproto.pyx#L600-L618

     cdef _do_flush(self, object context=None): 
         """Flush the write backlog, discarding new data received. 
  
         We don't send close_notify in FLUSHING because we still want to send 
         the remaining data over SSL, even if we received a close_notify. Also, 
         no application-level resume_writing() or pause_writing() will be called 
         in FLUSHING, as we could fully manage the flow control internally. 
         """ 
         try: 
             self._do_read_into_void(context) 
             self._do_write() 
             self._process_outgoing() 
             self._control_ssl_reading() 
         except Exception as ex: 
             self._on_shutdown_complete(ex) 
         else: 
             if not self._get_write_buffer_size(): 
                 self._set_state(SHUTDOWN) 
                 self._do_shutdown(context) 

Before uvloop fixes the bug, here's a simple hotfix to stop uvloop from reading the response happily and not willing to close the connection.

class AiohttpUvloopTransportHotfix(contextlib.AbstractAsyncContextManager):
    def __init__(self, response: aiohttp.ClientResponse):
        self.response = response

    async def __aexit__(self, exc_type, exc_val, exc_tb):
        if UVLOOP_ENABLED:
            self.response.connection.transport.abort()

async def fetch(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            async with AiohttpUvloopTransportHotfix(response):
                ...

CC @Dreamsorcerer

Rongronggg9 avatar May 24 '22 23:05 Rongronggg9

Impressive deep dive. If this is a uvloop bug, have you reported it there? A link to it here would be useful for tracking.

Dreamsorcerer avatar May 25 '22 16:05 Dreamsorcerer

I'd already opened a mirror issue before.

Additional context

The bug has also been reported to uvloop: https://github.com/MagicStack/uvloop/issues/471

Rongronggg9 avatar May 25 '22 16:05 Rongronggg9

Ah, great. I wonder if there's an easy way to put a temp fix in aiohttp, which would hopefully get our CI tests stable.

Dreamsorcerer avatar May 25 '22 16:05 Dreamsorcerer

Maybe we can just add those lines into ClientSession, possibly even just monkey patching it in the test code.

Dreamsorcerer avatar May 25 '22 17:05 Dreamsorcerer

Ah, great. I wonder if there's an easy way to put a temp fix in aiohttp, which would hopefully get our CI tests stable.

Maybe we can just add those lines into ClientSession, possibly even just monkey patching it in the test code.

Temporarily fixing it is not difficult, just call transport.abort() instead of transport.close() if the transport is a uvloop SSL protocol transport. But there is indeed a lot to do since there are a lot of transport.close() calls scattered in aiohttp. I am not really sure which ones are not affected by the uvloop bug. The problem also exists in the monkey-patching-in-test-code solution - when to monkey-patch and when not?

asyncio: graceful close (general behavior)

uvloop: hard abort (with my hotfix)

Rongronggg9 avatar May 25 '22 17:05 Rongronggg9

Hi @Rongronggg9, could this somehow lead to a memory leak? I'm facing an issue where after switching to aiohttp (3.8.1) we began getting high RAM consumption after spikes in requests but which doesn't drop afterwards.

We also use uvloop and make lots of TLS/SSL connections. I will try to test without uvloop to confirm.

beesaferoot avatar Sep 07 '22 07:09 beesaferoot

@beesaferoot Hmmm... I don't think so, because uvloop will eventually force abort the connection after 30s as I've described before. Also, in my usage, either with/without uvloop or with/without my uvloop hotfix, aiohttp never shows any potential to leak memory.

My advice is to check if there are unclosed responses, sessions, or so on. Also, if you are using CPython built with Glibc (in most cases it is exactly what you are using on Linux), you should consider the possibility of memory fragmentation (for more details: https://github.com/kurtmckee/feedparser/issues/287).

Rongronggg9 avatar Sep 07 '22 08:09 Rongronggg9

I see, I will investigate inline with your suggestion. Thanks for the insight

beesaferoot avatar Sep 07 '22 09:09 beesaferoot

It appears to be a fragmentation problem, as setting the env variable MALLOC_MMAP_THRESHOLD_ stops the ever increasing memory!

beesaferoot avatar Sep 07 '22 23:09 beesaferoot

uvloop v0.17.0 was just released, but still buggy

Rongronggg9 avatar Sep 15 '22 01:09 Rongronggg9

I've also tested the vanilla asyncio in CPython 3.11.0rc1/rc2 since it has "borrowed"[^1] the SSL implementation from uvloop. Still, the connection is closed as expected.

But a fatal error is logged:

DEBUG:asyncio:<asyncio.sslproto.SSLProtocol object at 0x7fdacf7879d0>: Fatal error on transport
Traceback (most recent call last):
  File "/usr/lib/python3.11/asyncio/sslproto.py", line 646, in _do_shutdown
    self._sslobj.unwrap()
  File "/usr/lib/python3.11/ssl.py", line 983, in unwrap
    return self._sslobj.shutdown()
           ^^^^^^^^^^^^^^^^^^^^^^^
ssl.SSLError: [SSL: APPLICATION_DATA_AFTER_CLOSE_NOTIFY] application data after close notify (_ssl.c:2672)

I consider the fatal error is pretty OK:

  • https://github.com/python/cpython/issues/84132

[^1]: https://github.com/python/cpython/issues/88177, https://github.com/python/cpython/pull/31275

Rongronggg9 avatar Sep 15 '22 04:09 Rongronggg9

So here locates the buggy code:

cdef _do_flush(self, object context=None):
    try:
        self._do_read_into_void(context)
        self._do_write()
        self._process_outgoing()
        self._control_ssl_reading()
    except Exception as ex:
        self._on_shutdown_complete(ex)
    else:
        if not self._get_write_buffer_size():
            self._set_state(SHUTDOWN)
            self._do_shutdown(context)

cdef _do_shutdown(self, object context=None):
    try:
        # we must skip all application data (if any) before unwrap
        self._do_read_into_void(context)  # <--- I am so sad because I am buggy
        try:
            self._sslobj.unwrap()
        except ssl_SSLAgainErrors as exc:
            self._process_outgoing()
        else:
            self._process_outgoing()
            if not self._get_write_buffer_size():
                self._on_shutdown_complete(None)
    except Exception as ex:
        self._on_shutdown_complete(ex)

Being commented out, uvloop shows 100% the same behavior as vanilla asyncio in CPython 3.11 (closing the connection as expected and logging the fatal error):

DEBUG:asyncio:<uvloop.loop.SSLProtocol object at 0x7f9cbfbcf5e0>: Error occurred during shutdown
Traceback (most recent call last):
  File "uvloop/sslproto.pyx", line 624, in uvloop.loop.SSLProtocol._do_shutdown
  File "/usr/lib/python3.11/ssl.py", line 983, in unwrap
    return self._sslobj.shutdown()
           ^^^^^^^^^^^^^^^^^^^^^^^
ssl.SSLError: [SSL: APPLICATION_DATA_AFTER_CLOSE_NOTIFY] application data after close notify (_ssl.c:2672)

You may also inspect the source code of both uvloop and asyncio yourself. Vanilla asyncio in CPython 3.11 indeed does NOT read before unwrap in the shutdown stage.

def _do_flush(self):
    self._do_read()
    self._set_state(SSLProtocolState.SHUTDOWN)
    self._do_shutdown()

def _do_shutdown(self):
    try:
        if not self._eof_received:
            self._sslobj.unwrap()
    except SSLAgainErrors:
        self._process_outgoing()
    except ssl.SSLError as exc:
        self._on_shutdown_complete(exc)
    else:
        self._process_outgoing()
        self._call_eof_received()
        self._on_shutdown_complete(None)

https://github.com/python/cpython/blob/3.11/Lib/asyncio/sslproto.py#L643

Rongronggg9 avatar Sep 15 '22 05:09 Rongronggg9

CC

The issue has been located: https://github.com/MagicStack/uvloop/issues/471#issuecomment-1247584288 https://github.com/MagicStack/uvloop/issues/471#issuecomment-1249170910 https://github.com/MagicStack/uvloop/issues/471#issuecomment-1264470819

However, more discussion is needed since the issue is actually caused by a widespread implementation practice violating the specification: https://github.com/MagicStack/uvloop/issues/471#issuecomment-1264648224

Rongronggg9 avatar Oct 02 '22 13:10 Rongronggg9