grpc-java icon indicating copy to clipboard operation
grpc-java copied to clipboard

LEAK: ByteBuf.release() was not called before it's garbage-collected

Open natraj09 opened this issue 3 years ago • 11 comments

What version of gRPC-Java are you using?

1.49.1

What is your environment?

JDK 17 , linux

What did you expect to see?

NO LEAK DETECTION message

What did you see instead?

 message:  LEAK: ByteBuf.release() was not called before it's garbage-collected. See https://netty.io/wiki/reference-counted-objects.html for more information.
Recent access records: 

Here's the complete exception trace. leak_record.txt

Steps to reproduce the bug

We have a gRPC service which calls another gRPC service We found the leak detection message few mins after server starts up and requests are sent to the service. I turned on the leak detection to paranoid by setting -Dio.grpc.netty.shaded.io.leakDetection.level=paranoid. We immediately started to see lot of LEAK message.

We suspected its something to do with setting a deadline

Here's a sample snippet

ServingProto.Response response;
    try {
      response =
          this.client
              .withDeadlineAfter(deadlineInMs, TimeUnit.MILLISECONDS)
              .getX(request);
    } catch (StatusRuntimeException e) {
      logger.error("Error", e);
      return Y(request);
    }

We removed the deadline and with paranoid settings, we dont see any error message anymore.

Is there anything wrong with the way the deadline is handled in the above scenario or is this a bug?

natraj09 avatar Sep 24 '22 03:09 natraj09

I don't see anything obviously wrong. Do the RPCs time out when you set the deadline? Also have you tried increasing the timeout?

sanjaypujare avatar Sep 26 '22 06:09 sanjaypujare

Potentially related: #9340

ejona86 avatar Sep 26 '22 22:09 ejona86

@sanjaypujare RPCs do timeout. we do see sporadic timeouts and see the error message inside the catch block @ejona86 I looked at the issue before posting here , there is no retry mechanism in the grpc client.

natraj09 avatar Sep 27 '22 00:09 natraj09

... @ejona86 I looked at the issue before posting here , there is no retry mechanism in the grpc client.

Just trying to clarify: you mean you have disableRetry() set on the client channel? Also it looks like if you are using 1.49.1 it should already have the fix for #9340 so this might be a different (but related) issue...

sanjaypujare avatar Sep 27 '22 16:09 sanjaypujare

@sanjaypujare Is retry a default option? I didn’t set disableRetry explicitly

natraj09 avatar Sep 27 '22 17:09 natraj09

@sanjaypujare Is retry a default option? I didn’t set disableRetry explicitly

Transparent retries are enabled by default (AFAIK). Pls disable retry and see.

sanjaypujare avatar Sep 27 '22 18:09 sanjaypujare

disableRetry worked, Don't see the LEAK message any more, but seeing new exception. this seem to happen on server startup when the first few requests exceed the deadline. Is this expected?

io.grpc.StatusRuntimeException: UNAVAILABLE: Abrupt GOAWAY closed sent stream. HTTP/2 error code: NO_ERROR
	at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:271)
	at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:252)
	at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:165)
	at feast.proto.serving.ServingServiceGrpc$ServingServiceBlockingStub.getOnlineFeatures(ServingServiceGrpc.java:247)

natraj09 avatar Sep 28 '22 01:09 natraj09

Yes, startup of any app in Java can cause sluggishness so this is to be expected. If this resolves your problem you can close this issue. Another thing to verify is whether the fix in https://github.com/grpc/grpc-java/pull/9360 is working for you or not.

sanjaypujare avatar Sep 28 '22 03:09 sanjaypujare

Disabling retrying "working" is a workaround. We're happy that there is a workaround, but it seems this is now more clearly a bug triggered by retry. Something in my previous workaround wasn't enough, since 1.49.1 included the workaround.

My "fix" was for the writing direction. This issue looks to be the read path based on leak_record.txt.

ejona86 avatar Sep 28 '22 03:09 ejona86

@sanjaypujare If I understand, it's just a workaround solution. should I keep this open until there is a permanent solution?

natraj09 avatar Sep 28 '22 12:09 natraj09

@natraj09 yes. @ejona86 's comments also indicated this is a different issue so let's keep it open

sanjaypujare avatar Sep 28 '22 14:09 sanjaypujare

Hi, is this issue solved? We observed the same LEAK message recently, and after turning off retry, it disappears.

SzyWilliam avatar Oct 24 '22 16:10 SzyWilliam

This issue is still open/unresolved.

sanjaypujare avatar Oct 24 '22 17:10 sanjaypujare