grpc server memory leak
Bug Report
Version
└── tonic v0.5.2
└── tonic-build v0.5.2
Platform
Linux XXX 5.11.0-37-generic #41-Ubuntu SMP Mon Sep 20 16:39:20 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Description
tonic grpc server takes a lot of memory under high load and end up killed by the kernel for OOM.
The memory usage didn't drop even after the client stopped.
I made a simple demo with tonic related code only:
https://github.com/whfuyn/grpc-test
top -p `pgrep -d ',' 'grpc-'`
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
_ _ 20 0 3341.6m 2.5g 3.7m S 0.0 7.9 1:27.75 grpc-server
Other info:
- didn't occur in Windows 10.
- didn't occur when
[tokio::main(flavor = "current_thread")]and single client instance, but still occur when there are multiple clients (2 in my test). - mitigated when
max_concurrent_streamsis set, but results in some client requests timeout. - memory usage drop after client stopped when run with valgrind.
Also post the demo code here:
proto file
syntax = "proto3";
package test;
message Req {
bytes payload = 1;
}
message Resp {
bytes payload = 1;
}
service TestService {
rpc call(Req) returns (Resp);
}
server
struct Service;
#[tonic::async_trait]
impl TestService for Service {
async fn call(&self, _req: Request<Req>) -> Result<Response<Resp>, Status> {
Ok(Response::new(Resp{ payload: vec![] }))
}
}
#[tokio::main]
async fn main() {
let hostname = "0.0.0.0";
let port = 6666;
let addr = format!("{}:{}", hostname, port).parse().unwrap();
tonic::transport::Server::builder()
.add_service(TestServiceServer::new(Service))
.serve(addr)
.await
.unwrap();
}
client, run multiple instances to reproduce this quickly.
let hostname = "localhost";
let port = 6666;
let client = {
let uri = format!("http://{}:{}", hostname, port);
let channel = Endpoint::from_shared(uri)
.unwrap()
.connect_lazy()
.unwrap();
TestServiceClient::new(channel)
};
let hs: Vec<tokio::task::JoinHandle<()>> = (0..200_000).map(|_| {
let mut client = client.clone();
tokio::spawn(async move {
client.call(Req{ payload: vec![6; 1000]}).await.unwrap();
})
})
.collect();
for h in hs {
h.await.unwrap();
}
It seems caused by hyper. https://github.com/hyperium/hyper/issues/1790
Seems that I stuck with the same issue as well. I've tried to debug your example with console-subscriber and I found out remarkably increasing amount of task after some time (~30 secs). After client finishing the task amount doesn't decrease.
Before running a client:

After running a client:

It's a good idea to use console. I'll try to dig into it later.
@seanmonstar ^
This is probably a better issue for hyper so might be good to create a minimal reproduction.
@LucioFranco by minimal reproduction, do you mean of hyper code or of tonic code?
~This behavior (of tasks leaking) is present even in the helloworld-server example.~
Here's a screenshot from tokio-console after a single invocation of the helloworld-client:
And here's a screenshot from tokio-console after 79 more invocations:
edit: I had missed the active (running+idle) task count and was instead looking at the total task count, which includes tasks that have completed.
In order for tokio-console to work, I added only the following to the helloworld-server example (in addition to adding console-subscriber to the Cargo.toml):
diff --git a/examples/src/helloworld/server.rs b/examples/src/helloworld/server.rs
index c6398bb..6502e61 100644
--- a/examples/src/helloworld/server.rs
+++ b/examples/src/helloworld/server.rs
@@ -27,6 +27,7 @@ impl Greeter for MyGreeter {
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
+ console_subscriber::init();
let addr = "[::1]:50051".parse().unwrap();
let greeter = MyGreeter::default();
The memory usage ~(and slowdown from so many tasks)~ grows slowly, but it grows indefinitely.
Actually, I think I just forgot how to read tokio-console and ignored the fact that the tasks are no longer running (they are visibly removed with tokio-console --retain-for 1us).
My apologies.
The memory usage does still seem to grow over time even with the simple helloworld-server, but I'll do some more digging first to better understand what's happening, since tasks don't appear to be leaking.
Not sure if this applies to OP, but I think my issue at least was actually due to expected behavior within console-subscriber itself (i.e., keeping relevant information around for an hour after a task completes by default): https://github.com/tokio-rs/console/issues/184#issuecomment-984823506
Running the example as-is (i.e., without console-subscriber) does not appear to cause any memory usage growth.