tonic icon indicating copy to clipboard operation
tonic copied to clipboard

grpc server memory leak

Open whfuyn opened this issue 4 years ago • 8 comments

Bug Report

Version

└── tonic v0.5.2
└── tonic-build v0.5.2

Platform

Linux XXX 5.11.0-37-generic #41-Ubuntu SMP Mon Sep 20 16:39:20 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Description

tonic grpc server takes a lot of memory under high load and end up killed by the kernel for OOM.

The memory usage didn't drop even after the client stopped.

I made a simple demo with tonic related code only: https://github.com/whfuyn/grpc-test

top -p `pgrep -d ',' 'grpc-'`

PID     USER  PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
_       _      20   0 3341.6m   2.5g   3.7m S   0.0   7.9   1:27.75 grpc-server

Other info:

  1. didn't occur in Windows 10.
  2. didn't occur when [tokio::main(flavor = "current_thread")] and single client instance, but still occur when there are multiple clients (2 in my test).
  3. mitigated when max_concurrent_streams is set, but results in some client requests timeout.
  4. memory usage drop after client stopped when run with valgrind.

Also post the demo code here:

proto file

syntax = "proto3";
package test;

message Req {
    bytes payload = 1;
}

message Resp {
    bytes payload = 1;
}

service TestService {
    rpc call(Req) returns (Resp);
}

server

struct Service;

#[tonic::async_trait]
impl TestService for Service {
    async fn call(&self, _req: Request<Req>) -> Result<Response<Resp>, Status> {
        Ok(Response::new(Resp{ payload: vec![] }))
    }
}

#[tokio::main]
async fn main() {
    let hostname = "0.0.0.0";
    let port = 6666;

    let addr = format!("{}:{}", hostname, port).parse().unwrap();
    tonic::transport::Server::builder()
        .add_service(TestServiceServer::new(Service))
        .serve(addr)
        .await
        .unwrap();
}

client, run multiple instances to reproduce this quickly.

let hostname = "localhost";
let port = 6666;

let client = {
    let uri = format!("http://{}:{}", hostname, port);
    let channel = Endpoint::from_shared(uri)
        .unwrap()
        .connect_lazy()
        .unwrap();
    TestServiceClient::new(channel)
};

let hs: Vec<tokio::task::JoinHandle<()>> = (0..200_000).map(|_| {
    let mut client = client.clone();
    tokio::spawn(async move {
        client.call(Req{ payload: vec![6; 1000]}).await.unwrap();
    })
})
.collect();

for h in hs {
    h.await.unwrap();
}

whfuyn avatar Oct 19 '21 03:10 whfuyn

It seems caused by hyper. https://github.com/hyperium/hyper/issues/1790

whfuyn avatar Oct 19 '21 08:10 whfuyn

Seems that I stuck with the same issue as well. I've tried to debug your example with console-subscriber and I found out remarkably increasing amount of task after some time (~30 secs). After client finishing the task amount doesn't decrease.

Before running a client: Screenshot 2021-12-21 at 17 03 15

After running a client: Screenshot 2021-12-21 at 17 06 04

aleksuss avatar Dec 21 '21 15:12 aleksuss

It's a good idea to use console. I'll try to dig into it later.

whfuyn avatar Dec 23 '21 02:12 whfuyn

@seanmonstar ^

LucioFranco avatar Dec 30 '21 18:12 LucioFranco

This is probably a better issue for hyper so might be good to create a minimal reproduction.

LucioFranco avatar Dec 30 '21 18:12 LucioFranco

@LucioFranco by minimal reproduction, do you mean of hyper code or of tonic code? ~This behavior (of tasks leaking) is present even in the helloworld-server example.~ Here's a screenshot from tokio-console after a single invocation of the helloworld-client: 20220630_15h32m45s_grim And here's a screenshot from tokio-console after 79 more invocations: 20220630_15h40m58s_grim edit: I had missed the active (running+idle) task count and was instead looking at the total task count, which includes tasks that have completed.

In order for tokio-console to work, I added only the following to the helloworld-server example (in addition to adding console-subscriber to the Cargo.toml):

diff --git a/examples/src/helloworld/server.rs b/examples/src/helloworld/server.rs
index c6398bb..6502e61 100644
--- a/examples/src/helloworld/server.rs
+++ b/examples/src/helloworld/server.rs
@@ -27,6 +27,7 @@ impl Greeter for MyGreeter {

 #[tokio::main]
 async fn main() -> Result<(), Box<dyn std::error::Error>> {
+    console_subscriber::init();
     let addr = "[::1]:50051".parse().unwrap();
     let greeter = MyGreeter::default();

The memory usage ~(and slowdown from so many tasks)~ grows slowly, but it grows indefinitely.

nashley avatar Jun 30 '22 15:06 nashley

Actually, I think I just forgot how to read tokio-console and ignored the fact that the tasks are no longer running (they are visibly removed with tokio-console --retain-for 1us).

My apologies.

The memory usage does still seem to grow over time even with the simple helloworld-server, but I'll do some more digging first to better understand what's happening, since tasks don't appear to be leaking.

nashley avatar Jun 30 '22 15:06 nashley

Not sure if this applies to OP, but I think my issue at least was actually due to expected behavior within console-subscriber itself (i.e., keeping relevant information around for an hour after a task completes by default): https://github.com/tokio-rs/console/issues/184#issuecomment-984823506

Running the example as-is (i.e., without console-subscriber) does not appear to cause any memory usage growth.

nashley avatar Jun 30 '22 18:06 nashley