tokio
tokio copied to clipboard
Tokio Might Need a Safe Method to Refresh Runtime
hi!
I've been following many web frameworks based on tokio, including actix, hyper, axum, warp, salvo, and others. Many of these web frameworks have issues related to memory leaks, but after investigation, I found that most of these frameworks don't actually have memory leaks. A process not immediately returning memory to the system can't be strictly defined as a memory leak in most cases.
For example, during stress testing, I used reqwest on one Ubuntu machine to perform millions of concurrent HTTP/2 requests against another Ubuntu machine running a tokio-based web service. The server-side handlers were simply designed to asynchronously sleep for 20 seconds.
On the server-side, each framework's performance is very similar. They typically accept all requests within about 4 seconds, then sleep for 20 seconds before responding to the client. Clients finish processing all responses in around 30 seconds, with an average response time from sending to receiving of about 22 seconds. All tokio-based web frameworks perform well.
However, after millions of concurrent requests, the process memory for actix and axum remains at 3.5GB to 4.5GB, and this memory never decreases automatically. Unless the memory allocator is changed to mi_malloc, in which case the memory drops to 0.7GB to 1.5GB after concurrency ends. Many rust web frameworks' memory leak issues are like this – the memory does not contract after high concurrency without changing the memory allocator.
In the issue sections for tokio, actix, axum, and hyper, when encountering memory leak issues, community developers and contributors often say that not returning memory to the system improves performance for future memory usage. However, there is no evidence to support that releasing memory significantly impacts server performance. Based on my multiple rounds of testing, whether starting up or using mi_malloc, regardless of the size of the memory, the response times for clients during the next round of million-level concurrency are similar. This means that the server-side handling performance remains consistent, making it hard to prove the claims that not releasing memory significantly improves the performance of future memory allocations.
For the server-side, if an attacker targets a slow web API and initiates a massive concurrent attack, even if the server process does not crash, the consumed memory will not automatically disappear. Only by restarting the process can the memory be reduced to normal levels. For example, if your business QPS is usually a few hundred to a few thousand, and regular memory usage is only 100 MB, after an attack, the memory might become 4.5GB that cannot contract. This situation is unfavorable for operations management, as monitoring data should be as accurate as possible. The behavior of tokio-based web frameworks in this regard is quite troubling.
However, there are some workarounds. For instance, when using axum or hyper, I tried declaring a global RUNTIME
and then spawning coroutines to handle requests using RUNTIME.spawn(async move {...})
during loop-accept. When refreshing is needed, I use std::mem::replace
in a lock-free manner to replace the global RUNTIME
. The replaced runtime waits asynchronously for a period before shutting down with rt.shutdown_background()
. Through this method, the process memory of axum, which was at 4.5GB, can be reduced to 0.5GB after several refreshes. If using mi_malloc, the memory can shrink to as low as 20MB, which is a good result with almost no loss in performance.
However, this approach has its drawbacks. If the service includes long-lived connections like HTTP/2, SSE, or WebSockets, safely shutting down the server becomes challenging. Sometimes, even if rt.metrics().num_alive_tasks()
shows zero active tasks, it doesn't necessarily mean the runtime can be safely shut down. It might be due to existing TCP connections where the client is still reading data from the TCP buffer. If the TCP connection is terminated at this point, the client's buffered data would be discarded, causing the client to fail to receive the data. Also, this method only works for axum and hyper, and is not suitable for many other web frameworks.
From these tests, it seems that tokio::runtime::Runtime
indeed has an issue with not releasing memory, though this cannot strictly be defined as a memory leak. Similar to basic data structures like VecQueue, LinkedList, and HashMap, after inserting a large number of elements, even if all elements are removed, the process memory does not immediately return to the system, unless the variable holding the data structure is destroyed. In most cases, destroying the container variable results in the process memory being released normally.
To address the title, I hope that tokio could provide a safe way to refresh the runtime. I am not familiar with all the source code of the runtime, but I suspect that there may be a situation where containers are expanded but the memory is not immediately returned to the system. If there were a method to manually refresh all containers in the runtime – create new containers, transfer elements from old containers to the new ones, and then safely destroy the old containers, it might help mitigate the problem of memory not being released. Ideally, this process should be lock-free so that it doesn't impact the spawning of new tasks during the refresh.
It's important to note that during testing, you shouldn't merely use tokio::spawn + tokio::time::sleep
for concurrency testing. For simple memory structures, the system can sometimes recover memory, but when dealing with complex coroutines in hyper or axum, you can't guarantee memory contraction. However, overall destruction and recreation of the runtime always ensures memory contraction. Additionally, the code for million-level concurrency with HTTP/2 is not complex. The only thing to note is to increase the max_concurrent_streams
parameter in hyper (I set it to 1,000,000).
In summary, for the issue of memory release, perhaps only tokio can solve it.
Additionally, I wrote an article about testing rust web frameworks in Chinese, which can be translated for reading: rust的web框架单机百万并发的性能与开销