tonic
tonic copied to clipboard
Memory leak in client streaming code
Hi all,
Actually i encountered this issue in one my projects. Since i could not share the original code i created a sample scala grpc server and a rust client to demonstrate issue. The issue is i am streaming a binary data from client(rust) to server(scala). Every stream element consist of a 1mb array and for every request we have 130 element. After some time later client side memory grows and linux oom trigger a kill for the client. Same client code also exist in scala project as well. tonic versions: tonic v0.4.1 tonic-build v0.4.1 uname -a: Linux capacman-neon 5.4.0-67-generic #75-Ubuntu SMP Fri Feb 19 18:03:38 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux below are the repos: https://github.com/capacman/grpctest https://github.com/capacman/grpctest_rust on scala side to run server you can use:
sbt "runMain example.TestServer"
I think issue occurs when scala side just get the head value of the stream and returns a response. I implemented a similar client with https://github.com/tikv/grpc-rs. After fifth or sixth binary data chunk client side gets an RpcFinished(None) or RemoteStopped error. I think tonic does not close connection and holds a reference to stream and after some point oom occurs.
So is the issue that making a client to server stream appears to leak memory if the server only looks at the Request and doesn't actually consume the stream?
In my case yes. Server just taking head of the stream for example. In @tinternet case is also similar but reverse i think, i mean in that case if client drop consuming response then server is leaking memory.
I don't quite know enough about the internals but could this be fixed by using tcp/http2 keepalive? Tonic lets hyper and h2 handle all the low level transport/connection parts.
I can try it and report the results. Is there any flags for this? For tcp/http2 keepalive?
I tried with keep alive and some thing interesting happened. If i set http2_keep_alive_interval on endpoint less than loop sleep time(like 30 ms sleep and keep alive interval is 10) then there is no memory increase occurs. But if i dont set it more than sleep time or remove it than memory increase very quickly and oom killer kills it.
What do you mean by "loop sleep time"?
I put a reproducible example in those repos: https://github.com/capacman/grpctest (scala server side) https://github.com/capacman/grpctest_rust (rust client side)
loop {
let client = TestLoaderServiceClient::connect("http://localhost:9000").await?;
let result = client.clone().load(tokio_stream::iter(stream.clone())).await?.into_inner();
println!("response is {:?}",result);
tokio::time::sleep(Duration::from_millis(60)).await;
}
calling grpc service in a loop. Memory usage will not increase if the sleep time on the last line is more than http2_keep_alive_interval on the client endpoint. However, if you do not set http2_keep_alive_interval and sleep time is short (60 ms for example) or sleep time is significantly less than http2_keep_alive_interval (half the value of http2_keep_alive_interval), memory usage increases and the operating system terminates the process.
ping @seanmonstar @LucioFranco. Do you anything about this?
Would it be possible to reproduce the server side in rust rather than scala? I don't have much experience reading scala. Another thing you can look at too is if the stream goes away. If you never consume the rest of the stream then it may get held up at the other end taking up memory.