cpython
cpython copied to clipboard
TLS/SSL asyncio leaks memory
Bug report
Bug description:
python3.9 without uvloop doesn't leaks memory (or noticeably smaller). python3.11+ (and others?) leaks memory A LOT under load (with or without uvloop) - up to +2gb per every test!
test commands: ab -n50000 -c15000 -r https://127.0.0.1/ (apt install apache2-utils)
import asyncio, ssl, uvloop
class HTTP(asyncio.Protocol):
def __init__(self):
self.transport = None
def connection_made(self, transport):
self.transport = transport
def data_received(self, data):
self.transport.write(
b'HTTP/1.1 200 OK\r\nContent-Length: 0\r\nConnection: keep-alive\r\n\r\n'
)
self.transport.close()
def make_tls_context():
ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
ctx.load_cert_chain('cert.crt', 'cert.key')
return ctx
tls_context = make_tls_context()
loop = uvloop.new_event_loop()
async def start_server(loop):
return await loop.create_server(
HTTP, '127.0.0.1', 443, backlog=65535,
ssl=tls_context)
loop.run_until_complete(start_server(loop))
loop.run_forever()
CPython versions tested on:
3.9, 3.11, 3.12
Operating systems tested on:
Debian Linux
Linked PRs
- gh-113338
- gh-114074
- gh-114311
- gh-114367
- gh-114829
- gh-114830
- gh-115643
Can confirm that the issue exists - Python 3.11, and uvloop 0.17.0, when trying to set up many client SSL connections that sometimes reconnects - even directly with asyncio.create_connection, and it's cleaned up not properly in some place.
Sometimes it leaked several megabytes per second for me.
I was able to track this by attaching to a running python process via gdb, and when it leaked, I put a catchpoint for brk syscalls, and examined 40 sequential breakpoint events, when all were related to /usr/lib/python3.10/ssl.py.
(gdb) py-bt
Traceback (most recent call first):
File "/usr/lib/python3.10/ssl.py", line 917, in read
v = self._sslobj.read(len)
<built-in method buffer_updated of uvloop.loop.SSLProtocol object at remote 0xffff758f90c0>
<built-in method run of _contextvars.Context object at remote 0xffff758c7340>
File "/home/ubuntu/py-core/env/lib/python3.10/site-packages/aiohttp/web.py", line 544, in run_app
loop.run_until_complete(main_task)
File "/home/ubuntu/py-core/server_app.py", line 39, in <module>
web.run_app(app_factory(), loop=LOOP, host=args.host, port=args.port)
(gdb)
Also, a gc.get_objects() count was not actually growing, that's why I suppose that this is related a lower-level native implementation.
@1st1, @asvetlov, @graingert, @gvanrossum, @kumaraditya303 (as asyncio module experts)
Unfortunately I am really bad at this type of low level stuff. :-( If you suspect a leak in the C accelerator, you can disable it and see if the leak goes away. (Or at least becomes less severe.)
OTOH if the problem is in the SSL code, we need a different kind of expert.
Finally please figure out if this is in uvloop or not. If it’s tied to uvloop this is the wrong tracker.
Unfortunately, this bug was reproduced only in our production, and we weren't able to cause it in other environments.
It's already mitigated, as the root cause was in poor stability of connections.
I don't think I would be able to reproduce it again in production without uvloop, because it caused a lot of harm for our system.
But ready to assist and provide more details, if anyone wants to ask me some clarifying questions about it.
Also, there are several related tickets, and I'm not sure if they are all about uvloop:
- https://github.com/python/cpython/issues/106684 - very similar cpython issue, already closed though.
- https://github.com/aio-libs/aiohttp/issues/7252 - also, an already closed, but recent
aiohttpissue, they were able to mitigate it somehow, but not sure if it was the same bug.
I actually was able to reproduce this leak isolated several days ago, but it leaked for only ~5 megabytes in several hours. I adopted a repro script from this comment to the aiohttp bug mentioned above to be ready for run, and ran it for about 4 hours.
The initial memory consumption (RSS) had initially stabilised on 258120 bytes, but after several hours increased to ~264000. Hope it's the same bug.
#!/usr/bin/env python3
import aiohttp
import tracemalloc
import ssl
import asyncio
def init():
timeout = aiohttp.ClientTimeout(connect=5, sock_read=5)
ssl_ctx = ssl._create_unverified_context()
conn = aiohttp.TCPConnector(ssl=ssl_ctx, enable_cleanup_closed=True)
session = aiohttp.ClientSession(connector=conn, timeout=timeout, cookie_jar=aiohttp.CookieJar(unsafe=True))
return session
async def fetch(client):
try:
async with client.request('GET', url=f'https://api.pro.coinbase.com/products/BTC-USD/ticker') as r:
msg = await r.text()
print(msg)
except asyncio.CancelledError:
raise
except Exception as err:
print("error", err)
pass
async def main():
requests = 600
clients = [init() for _ in range(requests)]
tracemalloc.start()
try:
while True:
await asyncio.gather(*[fetch(client) for client in clients])
await asyncio.sleep(5)
except asyncio.CancelledError:
pass # end and clean things up
finally:
memory_used = tracemalloc.get_traced_memory()
snapshot = tracemalloc.take_snapshot()
stats = snapshot.statistics('lineno')
for stat in stats[:10]:
print(stat)
try:
await asyncio.gather(*[client.close() for client in clients.values()])
except:
pass
asyncio.run(main())
Could you find a reproducer without aiohttp?
Unsure if this is helpful at all or related but I came across this issue https://github.com/encode/uvicorn/discussions/2078 in which in the thread it is discussed/concluded that the issue of memory not being released is not isolated to uvicorn but seen in granian/gunicorn/hypercorn and as a result could be interpreter level (apologies for butchering the summary). Thread has some great charts/analysis though it is on the server level, however the different implementations w.r.t granian and uvicorn can help approximate where the issue might be surfacing if it's related. Example repo: https://github.com/Besedo/memory_issue/tree/main by @EBazarov Apologies in advance if it is unrelated and will remove the comment/create one in the right place.
gi0baro:
Given everything we stated in the upper replies, I suspect there's no much to do about this. Given the number of frameworks and servers tested, it just seems the Python interpreter is the culprit here, as even if the heap memory gets released, the RSS remains high.
Upstream is pretty hard pressed to debug this (I'm no web developer or admin). Can I ask one of the framework owners or users who are struggling with this to help us reduce their example to the point where we have a small self-contained program that demonstrates the issue?
Upstream is pretty hard pressed to debug this (I'm no web developer or admin). Can I ask one of the framework owners or users who are struggling with this to help us reduce their example to the point where we have a small self-contained program that demonstrates the issue?
i provided working example of leak, testing with ab gaves 1+gb/min leak, openssl version 1.1 i can provide access to a dedicated server where this bug happens (not only one server affected)
IMPORTANT: there is no problem if using python3.9 WITHOUT uvloop, with uvloop it leaks, without is not. BUT python3.12 with or without uvloop leaks even more...
shorten the mvp even more
import asyncio, ssl, uvloop
class HTTP(asyncio.Protocol):
def __init__(self):
self.transport = None
def connection_made(self, transport):
self.transport = transport
def data_received(self, data):
self.transport.write(
b'HTTP/1.1 200 OK\r\nContent-Length: 0\r\nConnection: keep-alive\r\n\r\n'
)
self.transport.close()
def make_tls_context():
ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
ctx.load_cert_chain('cert.crt', 'cert.key')
return ctx
tls_context = make_tls_context()
loop = uvloop.new_event_loop()
#loop = asyncio.new_event_loop()
async def start_server(loop):
return await loop.create_server(
HTTP, '127.0.0.1', 443, backlog=65535,
ssl=tls_context)
loop.run_until_complete(start_server(loop))
loop.run_forever()
Updated first post for clarity. Problem not with openssl itself, it leaks without uvloop even with python3.12
example certificates certs.zip
I only have a Mac, and I got the example to work and even managed to install the apache2 utils. I modified the example to print the process size once a second (rather than bothering with graphs) using psutil. Then it got messy, the ab tool often ends with an error. It does show a huge process size increase when ab sort-of works.
Now I've got an idea. Given the complexity of the ssl module (a C extension that wraps OpenSSL) it is much more likely that the problem (if there is really a problem) is due to the code in that extension module than in asyncio, whose TLS support is just a small amount of Python code.
So maybe someone who is interested in getting to the bottom of this issue could rewrite the example without the use of asyncio. Instead you could use the socketserver module. This is ancient tech, but showing the leak when just using this will prove that the problem is in the ssl extension. We can then try to narrow down what happens in the ssl module to find the leak (I'm suspecting there's some path through the code that doesn't release a buffer it owns, but that covers a lot of cases).
(If you can't figure out how to use socketserver with TLS, maybe just a basic multi-threader hand-written serving loop could work. In either case you have to make sure to follow all the rules for serving TLS, of course.)
@gvanrossum Important note: Python 3.9.2 without uvloop (pure asyncio) doesn't leak, same code. But leaks using uvloop. As far i know, after Python 3.9 the SSL code is ported from uvloop to cpython?
Biggest leak i saw is at Python 3.12 (without uvloop)
This doesn’t really help. We need someone to try and find which objects are being leaked.
This doesn’t really help. We need someone to try and find which objects are being leaked.
i tried different python memory profilers with no result at all, idk how to create useful dump, but inside coredump are tons of server certificates info
Might one of the magical functions in the gc module that give you all the objects be helpful?
objgraph is great for this https://objgraph.readthedocs.io/en/stable/#memory-leak-example
Sorry, what without repro (but I will try to do it anyway) but I also got the same issue with Tornado + SSL. And I can confirm, such a leak exist even without uvloop, and sometimes (if I response 404 to cloudflare proxy a lot) it's became significant. @rojamit sorry but in that case you should try valgrind, I also will try on weekend PS @rojamit do your project use cloudflare? I found a very annoying behavior bounded with keep-alive and which can also produce illusion of memory leak.
Summary
Findings:
- This appears to be an issue when the SSL handshake fails here
- When you turn of SSL verification, the memory leak won't be detected.
- The key memory allocation that is not being cleaned is at
sslproto.py:275
Minimum replication
This snippet runs the server and runs two separate pings (one insecure) in a subprocess, capturing the tracemalloc snapshots before and after each ping.
The output is as follows
Δ Memory Allocations = 2727.56kb
Δ Memory Allocations = 6.66kb # without SSL
Notice the n_iter=10
import asyncio
import asyncio.sslproto
import ssl
import tracemalloc
class HTTP(asyncio.Protocol):
def __init__(self):
self.transport = None
def connection_made(self, transport):
self.transport = transport
def data_received(self, data):
self.transport.write(
b"HTTP/1.1 200 OK\r\nContent-Length: 0\r\nConnection: keep-alive\r\n\r\n"
)
self.transport.close()
def make_tls_context():
ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
ctx.load_cert_chain(".jamie/iss109534/server.crt", ".jamie/iss109534/server.key")
return ctx
async def start_server(loop):
tls_context = make_tls_context()
return await loop.create_server(
HTTP, "127.0.0.1", 4443, backlog=65535, ssl=tls_context, start_serving=True
)
async def ping(delay: float = 1.0, n_iter: int = 1, insecure: bool = False):
await asyncio.sleep(delay)
# ----------------------------------------------------------------------------------
# Before
current_1, _ = tracemalloc.get_traced_memory()
# ----------------------------------------------------------------------------------
# Run a single request
if insecure:
cmd = "curl --insecure"
else:
cmd = "curl"
for _ in range(n_iter):
proc = await asyncio.create_subprocess_shell(
f"{cmd} https://127.0.0.1:4443",
stderr=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE
)
await proc.communicate()
# ----------------------------------------------------------------------------------
# After
current_2, _ = tracemalloc.get_traced_memory()
print(f"Δ Memory Allocations = {(current_2 - current_1)/1000:.2f}kb")
if __name__ == "__main__":
tracemalloc.start()
loop = asyncio.new_event_loop()
loop.run_until_complete(start_server(loop))
# Run with SSL verification
loop.run_until_complete(ping(delay=0.5, n_iter=10))
# Run without SSL verification
loop.run_until_complete(ping(delay=0.5, insecure=1, n_iter=10))
loop.close()
Trace malloc snapshot
I updated ping to also take a tracemalloc.take_snapshot before and after and we see that the key allocation is here: sslproto.py:275
async def ping(delay: float = 1.0, n_iter: int = 1, insecure: bool = False):
# ...
snapshot_1 = tracemalloc.take_snapshot()
# ...
# Same as before
# ....
snapshot_2 = tracemalloc.take_snapshot()
print('-'*40)
if insecure:
print("Insecure")
print(f"Δ Inner Memory Allocations = {(current_2 - current_1)/1000:.2f}kb")
top_stats = snapshot_2.compare_to(snapshot_1, 'traceback')
print("\n[ Top stat ]")
for stat in top_stats[:1]:
print(stat)
for line in stat.traceback.format(limit=25):
print("\t", line)
print('-'*40)
And, now we can see where the allocations are happening (with n_iter=1)
Δ Inner Memory Allocations = 283.75kb
[ Top stat ]
/Users/jamphan/git/github.com/ordinary-jamie/cpython/.jamie/iss109534/tmp02.py:91: size=256 KiB (+256 KiB), count=2 (+2), average=128 KiB
File "/Users/jamphan/git/github.com/ordinary-jamie/cpython/.jamie/iss109534/tmp02.py", line 91
loop.run_until_complete(ping(delay=0.5))
File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/base_events.py", line 671
self.run_forever()
File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/base_events.py", line 638
self._run_once()
File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/base_events.py", line 1971
handle._run()
File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/events.py", line 84
self._context.run(self._callback, *self._args)
File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/selector_events.py", line 224
transport = self._make_ssl_transport(
File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/selector_events.py", line 83
ssl_protocol = sslproto.SSLProtocol(
File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/sslproto.py", line 275
self._ssl_buffer = bytearray(self.max_size)
Dev Mode
We can further confirm this when we run python -X dev we see the following warnings
Error on transport creation for incoming connection
handle_traceback: Handle created at (most recent call last):
File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/base_events.py", line 1963, in _run_once
handle._run()
File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/events.py", line 84, in _run
self._context.run(self._callback, *self._args)
File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/selector_events.py", line 966, in _read_ready
self._read_ready_cb()
File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/selector_events.py", line 998, in _read_ready__get_buffer
self._protocol.buffer_updated(nbytes)
File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/sslproto.py", line 436, in buffer_updated
self._do_handshake()
File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/sslproto.py", line 561, in _do_handshake
self._on_handshake_complete(exc)
File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/sslproto.py", line 586, in _on_handshake_complete
self._wakeup_waiter(exc)
File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/sslproto.py", line 366, in _wakeup_waiter
self._waiter.set_exception(exc)
protocol: <__main__.HTTP object at 0x1039294b0>
transport: <asyncio.sslproto._SSLProtocolTransport object at 0x103d6a210>
Traceback (most recent call last):
File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/selector_events.py", line 235, in _accept_connection2
await waiter
File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/sslproto.py", line 576, in _on_handshake_complete
raise handshake_exc
File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/sslproto.py", line 557, in _do_handshake
self._sslobj.do_handshake()
File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/ssl.py", line 917, in do_handshake
self._sslobj.do_handshake()
ssl.SSLError: [SSL: TLSV1_ALERT_UNKNOWN_CA] tlsv1 alert unknown ca (_ssl.c:1000)
Edit
Using curl --cacert /path/to/cert.crt as opposed to curl -k will get the same results; just to confirm the insecure flag isn't an invalid test.
Edit 2, updated minimal replication
A simpler code example for replicating the issue...
import asyncio
import ssl
import tracemalloc
async def main(certfile, keyfile):
tracemalloc.start()
# Start server with SSL
ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
ssl_context.load_cert_chain(certfile, keyfile)
await asyncio.start_server(
lambda _, w: w.write(b"\0"), "127.0.0.1", "4443", ssl=ssl_context
)
current_1, _ = tracemalloc.get_traced_memory()
# Ping the server with cURL
proc = await asyncio.create_subprocess_shell(
f"curl https://127.0.0.1:4443 2> /dev/null"
)
await proc.communicate()
current_2, _ = tracemalloc.get_traced_memory()
print(f"{(current_2-current_1)/1000:.2f}KB")
if __name__ == "__main__":
asyncio.run(
main(certfile="server.crt", keyfile="server.key")
)
@mjpieters Just curious -- have you ever experienced anything similar?
@ordinary-jamie
The key memory allocation that is not being cleaned is at
sslproto.py:275
Excellent find!! Can you think of a suitable place to free this memory? I think it must be passed into OpenSSL via a C wrapper and that C wrapper must somehow forget to free it? So it would have to be in the C code for _ssl where it receives that buffer.
I don't think the leak is in the Python code, even though that's where it's being allocated (Python objects don't easily leak attribute values).
@mjpieters Just curious -- have you ever experienced anything similar?
I haven't, the code using asyncio HTTPS is relatively short-running (a few minutes at a time at most).
Hi @gvanrossum, I actually think our culprit is this memoryview reference to the bytearray.
As a quick spike (see this draft PR: https://github.com/python/cpython/pull/113338), I added a .release() and cleared the references to the bytearray in what appears to be the final method before we wake up the future and set the exception SSLProto._fatal_error and that seems to have fixed the memory leak!
Re-running the example above
Δ Memory Allocations = 59.38kb # Was 2727.56kb before
Δ Memory Allocations = 6.20kb
This should be the sequence of method calls that lead to the memory leak before waking up the waiter Future:
SSLProto.connection_made
└── SSLProto._start_handshake
└── SSLProto._do_handshake
└── SSLProto._on_handshake_complete
├── SSLProto._fatal_error
└── SSLProto._wakeup_waiter
└── _waiter.set_exception
The set exception on the waiter Future object should boil up to the BaseSelectorEventLoop._accept_connection Task (which wraps _accept_connection2). This is where we create the SSLProtocol (and keeps a reference to the waiter future) via _make_ssl_transport.
SSL module doesn't have a reference to the bytearray
I'm not familiar, but from what I can tell the _ssl.c extension doesn't appear to have a reference to the bytearray; the SSLProtocol does have a reference to a SSLContext here asyncio.sslproto.SSLProtocol._sslcontext = asyncio.sslproto._create_transport_context(...) but this structure doesn't doesn't appear to hold any references to our bytearray
Socket weak reference is released
A quick side note, on the socket not being closed -- we can see that the sock reference is released in the do_handshake impl function. I'm not sure where we should be closing this:
$ lldb -f python.exe -- .jamie/iss109534/minimal.py
(lldb) breakpoint set --file Modules/_ssl.c --line 1033
(lldb) r
Process 83199 launched: '/Users/jamphan/git/github.com/ordinary-jamie/cpython/python.exe' (arm64)
1 location added to breakpoint 1
Process 83199 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x00000001021ad98c _ssl.cpython-313d-darwin.so`_ssl__SSLSocket_do_handshake [inlined] _ssl__SSLSocket_do_handshake_impl(self=0x0000000102e13750) at _ssl.c:1033:9 [opt]
1030 } while (err.ssl == SSL_ERROR_WANT_READ ||
1031 err.ssl == SSL_ERROR_WANT_WRITE);
1032 Py_XDECREF(sock);
-> 1033 if (ret < 1)
1034 return PySSL_SetError(self, ret, __FILE__, __LINE__);
1035 if (PySSL_ChainExceptions(self) < 0)
1036 return NULL;
(lldb) frame variable sock
(PySocketSockObject *) sock = NULL
Good points. There are other places where that view goes out of scope though -- I don't recall, doesn't memoryview automatically release its reference when it is deleted? If not, that should be a common cause of leaks, so maybe it does, and the problem is that the view itself is held on by something else?
E.g. doing a simple search I found that get_buffer() in sslproto.py overwrites both buffer and view -- should this also release the view? And what about cases that don't end with _fatal_error()? Why don't those leak? (Or do they?)
Sorry for the slow response on this, @gvanrossum. I will continue looking at this, but with my limited knowledge, I am not making much progress (if someone more experienced wants to take over)
This might be my misunderstanding of CPython, but when I tried using tools such as MacOS's leaks I am not able to find any memory leaks.
Would this suggest that the reference to the SSLProtocol is simply being held on for a little longer than needed, so there is a delay in dereferencing the SSLProtocol object (and memoryview/bytearray buffers)?
To support this, here's a little test setup:
- Run a server as above in
loop.run_forever() - Monitor leaks output:
pid=$(pgrep python.exe); watch -n 0.1 leaks $pid - Load the server with k6:
k6 run --vus 100000 --duration 1s load.js
As k6 progresses it will reach a high point in physical footprint, and eventually a chunk of the physical footprint will be deallocated (notice the delta between peak and current in the final output of leaks):
Physical footprint: 108.3M
Physical footprint (peak): 298.0M
Idle exit: untracked
----
leaks Report Version: 4.0
Process 75055: 54222 nodes malloced for 83935 KB
Process 75055: 0 leaks for 0 total leaked bytes.
doing a simple search I found that get_buffer() in sslproto.py overwrites both buffer and view -- should this also release the view?
The buffer is created here _SelectorSocketTransport which in turn is passed back to SSLProtocol.buffer_updated. The ._incoming attribute is a ssl.MemoryBIO object which will release the buffer after it is done writing here
I'm not surprised that leaks tool doesn't find much, Python has several layers of its own memory allocation on top of the system allocator.
But you're on to something: maybe you could rerun your test after adding a gc.collect() call to the end of _run_once() in base_events.py.