grpc-java icon indicating copy to clipboard operation
grpc-java copied to clipboard

InProcess: client streaming memory leak when retry enabled

Open systemfreund opened this issue 3 years ago • 9 comments

What version of gRPC-Java are you using?

1.42.0

What is your environment?

Linux 5.15.2 openjdk version "11.0.13" 2021-10-19

What did you expect to see?

No java heap space exception.

What did you see instead?

java.lang.OutOfMemoryError: Java heap space

Steps to reproduce the bug

We've recently updated our grpc dependency from 1.39.0 to 1.42.0 which now causes an out of memory error in one of our tests.

I've attached a test project that can reproduce the problem. grpcheap.zip

How to run:

mvn clean test -Dgrpc.version=1.39.0   # test runs successfully
mvn clean test -Dgrpc.version=1.42.0   # test fails with OOM

Please be aware that the test is run with limited heap size (see pom.xml): -Xmx64M

The general idea of the test is this: Client-side has an InputStream that produces ~200mb data. This data of that stream is (lazily) split into ~1mb chunks and these are then streamed to the server.

What happens when I run the test with grpc >1.40 is that I get an OOM when it has sent ~60mb worth of chunks, which obviously is near the -Xmx limit. However, what I would expect is that the heap usage would never rise to that level, as after each 1mb chunk has been sent that chunk can be gc'ed. I've also attached a screenshot of visualvm where you can see the heap graph of the test using grpc 1.39.0 vs 1.42.0:

grpc

systemfreund avatar Nov 18 '21 12:11 systemfreund

Given retries were enabled in 1.40, it seems likely that there's an interaction with retries that are retaining data. Given that much data, the RetriableStream should commit itself. Maybe the real stream isn't catching up so passThrough != true?

As a workaround, ManagedChannelBuilder.disableRetry() probably will "fix" this behavior.

ejona86 avatar Nov 22 '21 17:11 ejona86

Yes, since 1.40, retry is enabled by default, so for client streaming RPC, the outbound messages will be cached in memory (retry buffer) until the call is committed, unless the buffer limit exceeded. So you need either ManagedChannelBuilder.disableRetry() or set a buffer limit small enough for -Xmx64M with ManagedChannelBuilder.perRpcBufferLimit() or ManagedChannelBuilder.retryBufferSize().

For your test, you can not use GrpcServerRule to set them because you can not access the ManagedChannelBuilder. You could use GrpcCleanupRule instead.

dapengzhang0 avatar Nov 23 '21 00:11 dapengzhang0

@dapengzhang0, isn't the per_rpc_buffer_limit default in Java 1 MB though? The cross-RPC default limit is 16 MB. It seems they have at least 30 MB available, so I don't see how retry could be properly functioning for this single RPC.

ejona86 avatar Nov 23 '21 00:11 ejona86

isn't the per_rpc_buffer_limit default in Java 1 MB though? The cross-RPC default limit is 16 MB. It seems they have at least 30 MB available

Oh, right. That means setting ManagedChannelBuilder.perRpcBufferLimit() and ManagedChannelBuilder.retryBufferSize() won't work.

I think the problem comes from InProcessTransport.InProcessStream.InProcessClientStream.writeMessage(), which doesn't trigger StreamTracer.outboundWireSize() (because it doesn't use a Framer), unlike AbstractStream.writeMessage() does. So the buffer limit never gets exceeded for InProcessClientStream.

dapengzhang0 avatar Nov 23 '21 01:11 dapengzhang0

Thanks all for your responses. I'll close the issue then, as it doesnt appear to be a bug.

systemfreund avatar Nov 23 '21 06:11 systemfreund

@ejona86 We still need to improve the behavior for InProcessTransport? Does it make sense to call outboundWireSize() for InProcess although there is actually no "wire"?

dapengzhang0 avatar Nov 23 '21 17:11 dapengzhang0

I think it'd make sense to add a mode to the inprocess transport to have it serialize payloads so that the sizes are reported. I also wonder if we should disable retries on inprocess by default.

ejona86 avatar Nov 23 '21 19:11 ejona86

I reopen the issue to track the InProcess payload size report/enable retry issue.

dapengzhang0 avatar Nov 23 '21 22:11 dapengzhang0

I am experiencing the same memory leak issue with gRPC-Java v1.45.1

Environment:

Linux 4.18.0-305.19.1.el8_4.x86_64 OpenJDK 64-Bit Server VM 21.9 (build 17.0.3+7-LTS, mixed mode, sharing)

ThoSap avatar May 26 '22 08:05 ThoSap