grpc-java
grpc-java copied to clipboard
InProcess: client streaming memory leak when retry enabled
What version of gRPC-Java are you using?
1.42.0
What is your environment?
Linux 5.15.2 openjdk version "11.0.13" 2021-10-19
What did you expect to see?
No java heap space exception.
What did you see instead?
java.lang.OutOfMemoryError: Java heap space
Steps to reproduce the bug
We've recently updated our grpc dependency from 1.39.0 to 1.42.0 which now causes an out of memory error in one of our tests.
I've attached a test project that can reproduce the problem. grpcheap.zip
How to run:
mvn clean test -Dgrpc.version=1.39.0 # test runs successfully
mvn clean test -Dgrpc.version=1.42.0 # test fails with OOM
Please be aware that the test is run with limited heap size (see pom.xml): -Xmx64M
The general idea of the test is this:
Client-side has an InputStream
that produces ~200mb data. This data of that stream is (lazily) split into ~1mb chunks and these are then streamed to the server.
What happens when I run the test with grpc >1.40 is that I get an OOM when it has sent ~60mb worth of chunks, which obviously is near the -Xmx
limit. However, what I would expect is that the heap usage would never rise to that level, as after each 1mb chunk has been sent that chunk can be gc'ed. I've also attached a screenshot of visualvm where you can see the heap graph of the test using grpc 1.39.0 vs 1.42.0:
Given retries were enabled in 1.40, it seems likely that there's an interaction with retries that are retaining data. Given that much data, the RetriableStream should commit itself. Maybe the real stream isn't catching up so passThrough != true
?
As a workaround, ManagedChannelBuilder.disableRetry()
probably will "fix" this behavior.
Yes, since 1.40, retry is enabled by default, so for client streaming RPC, the outbound messages will be cached in memory (retry buffer) until the call is committed, unless the buffer limit exceeded. So you need either ManagedChannelBuilder.disableRetry()
or set a buffer limit small enough for -Xmx64M
with ManagedChannelBuilder.perRpcBufferLimit()
or ManagedChannelBuilder.retryBufferSize()
.
For your test, you can not use GrpcServerRule
to set them because you can not access the ManagedChannelBuilder
. You could use GrpcCleanupRule
instead.
@dapengzhang0, isn't the per_rpc_buffer_limit default in Java 1 MB though? The cross-RPC default limit is 16 MB. It seems they have at least 30 MB available, so I don't see how retry could be properly functioning for this single RPC.
isn't the per_rpc_buffer_limit default in Java 1 MB though? The cross-RPC default limit is 16 MB. It seems they have at least 30 MB available
Oh, right. That means setting ManagedChannelBuilder.perRpcBufferLimit()
and ManagedChannelBuilder.retryBufferSize()
won't work.
I think the problem comes from InProcessTransport.InProcessStream.InProcessClientStream.writeMessage()
, which doesn't trigger StreamTracer.outboundWireSize()
(because it doesn't use a Framer), unlike AbstractStream.writeMessage()
does. So the buffer limit never gets exceeded for InProcessClientStream.
Thanks all for your responses. I'll close the issue then, as it doesnt appear to be a bug.
@ejona86 We still need to improve the behavior for InProcessTransport? Does it make sense to call outboundWireSize()
for InProcess although there is actually no "wire"?
I think it'd make sense to add a mode to the inprocess transport to have it serialize payloads so that the sizes are reported. I also wonder if we should disable retries on inprocess by default.
I reopen the issue to track the InProcess payload size report/enable retry issue.
I am experiencing the same memory leak issue with gRPC-Java v1.45.1
Environment:
Linux 4.18.0-305.19.1.el8_4.x86_64 OpenJDK 64-Bit Server VM 21.9 (build 17.0.3+7-LTS, mixed mode, sharing)