grpc-java icon indicating copy to clipboard operation
grpc-java copied to clipboard

Random io.grpc.StatusRuntimeException: CANCELLED exceptions in Server-Side Streaming RPC

Open dennyjames opened this issue 9 months ago • 7 comments

gRPC Version : 1.38.0 Java Version : openjdk version "11.0.27" 2025-04-15 LTS

This issue is random and is not reproducible in a test environment. Even on the environments were this is seen, issue happens randomly.

I have a server side streaming gRPC method, That returns a Iterator to the client. While looping on the iterator, below error occurs on the client side.

6066f128-75d7-42a2-af96-40c98bca2137 orgId=649529784 user=55fcc549-5f6a-42d7-85a0-2786f11a580b null
io.grpc.StatusRuntimeException: CANCELLED: Server sendMessage() failed with Error
	at io.grpc.Status.asRuntimeException(Status.java:532)
	at io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:662)
	at com.***.data.result.BatchedSqlResultSet.next(BatchedSqlResultSet.java:89)
	at com.***.common.sql.SqlParser.getFalconResultSet(SqlParser.java:95)
	at com.***.data.manager.exporter.DataExporterImpl.exportTableVisualization(DataExporterImpl.java:1064)
	at com.***.data.manager.exporter.DataExporterImpl.exportTableVisualization(DataExporterImpl.java:927)
	at com.***.data.manager.exporter.DataExporterImpl.exportDataForVisualization(DataExporterImpl.java:640)
	at com.***.data.manager.exporter.DataExporterImpl.exportDataForVisualizations(DataExporterImpl.java:538)

On the server side, I see the below error

java.lang.IllegalStateException: call already closed
        at com.google.common.base.Preconditions.checkState(Preconditions.java:502)
        at io.grpc.internal.ServerCallImpl.closeInternal(ServerCallImpl.java:218)
        at io.grpc.internal.ServerCallImpl.close(ServerCallImpl.java:211)
        at io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onCompleted(ServerCalls.java:395)
        at com.***.server.DataManagerGrpcImpl.executeStreamQuery(DataManagerGrpcImpl.java:177)

The fact that the error just mentions CANCELLED, without a detailed error makes it hard to understand the error.

dennyjames avatar May 27 '25 06:05 dennyjames

It looks like your service impl is unaware of the call being already cancelled from the server side (which could happen due to server side call timeout for example) but it still tries to write to the response stream. If your service impl uses async processing, the ServerCall's onCancel will indicate to the ServerCallStreamObserver's cancellation handler so you can stop writing to the stream. If your service impl blocks the calling thread, then ServerCall's onCancel will be called only after the onMessage returns and in this case you need to either use Context (polling or Listener) or ServerCallStreamObserver.isCancelled() (polling-only) to learn of the cancellation.

kannanjgithub avatar May 30 '25 09:05 kannanjgithub

Server sendMessage() failed with Error is pretty relevant. The IllegalStateException is unfortunate, and there may be something we could improve to make this error handling cleaner, but an Error just occurred, not an Exception. The most likely thing is your server got an OutOfMemoryError while trying to send a message. I suggest looking at the server logs for that or another Error.

We can't send more detailed error information to the client because we mustn't expose such detailed server application details with an arbitrary client.

ejona86 avatar Jun 26 '25 04:06 ejona86

@ejona86 We don't have a OutOfMemoryError on the server.

We are now seeing this error randomly even in Unary RPC calls. For Unary calls, we don't seejava.lang.IllegalStateException: call already closed on the server, I guess that is expected as there isn't a stream. A concrete pattern that we are observing whenever this error happens is the presence of a System.gc() event on the server's gc logs.

[398834.019s][info   ][gc,start       ] GC(480) Pause Young (System.gc())
[398834.032s][info   ][gc,heap        ] GC(480) PSYoungGen: 338650K->3648K(1212416K)
[398834.032s][info   ][gc,heap        ] GC(480) ParOldGen: 1809857K->1809857K(2162176K)
[398834.032s][info   ][gc,metaspace   ] GC(480) Metaspace: 115970K(118104K)->115970K(118104K) NonClass: 104893K(106456K)->104893K(106456K) Class: 11076K(11648K)->11076K(11648K)
[398834.032s][info   ][gc             ] GC(480) Pause Young (System.gc()) 2098M->1771M(3295M) 12.162ms
[398834.032s][info   ][gc,cpu         ] GC(480) User=0.03s Sys=0.00s Real=0.01s
[398834.032s][info   ][gc,start       ] GC(481) Pause Full (System.gc())

We are unaware of the trigger of this System.gc(), we don't explicitly do that from application code. To experiment a little we set the javaopt XX +DisableExplicitGC, to turn of explicit System.gc(). After this, we haven't seen this error so far. We are still monitoring, and I will update here later.

dennyjames avatar Jun 26 '25 06:06 dennyjames

There is absolutely some Error. If it isn't OutOfMemoryError, fine, but it is definitely something. I was just saying that as the most likely.

ejona86 avatar Jun 26 '25 14:06 ejona86

Not much to do without knowing the Error being seen. Closing.

ejona86 avatar Sep 03 '25 15:09 ejona86

Hello, I would like to reopen this issue as I found logs of OutOfMemory.

Exception in thread "grpc-default-executor-2739" java.lang.OutOfMemoryError: Direct buffer memory at java.base/java.nio.Bits.reserveMemory(Bits.java:175) at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:118) at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:317) at io.grpc.netty.shaded.io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:717) at io.grpc.netty.shaded.io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:692) at io.grpc.netty.shaded.io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:215) at io.grpc.netty.shaded.io.netty.buffer.PoolArena.tcacheAllocateNormal(PoolArena.java:197) at io.grpc.netty.shaded.io.netty.buffer.PoolArena.allocate(PoolArena.java:139) at io.grpc.netty.shaded.io.netty.buffer.PoolArena.allocate(PoolArena.java:129) at io.grpc.netty.shaded.io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:395) at io.grpc.netty.shaded.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:188) at io.grpc.netty.shaded.io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:124) at io.grpc.netty.shaded.io.grpc.netty.NettyWritableBufferAllocator.allocate(NettyWritableBufferAllocator.java:51) at io.grpc.internal.MessageFramer.writeKnownLengthUncompressed(MessageFramer.java:226) at io.grpc.internal.MessageFramer.writeUncompressed(MessageFramer.java:172) at io.grpc.internal.MessageFramer.writePayload(MessageFramer.java:143) at io.grpc.internal.AbstractStream.writeMessage(AbstractStream.java:66) at io.grpc.internal.ServerCallImpl.sendMessageInternal(ServerCallImpl.java:168) at io.grpc.internal.ServerCallImpl.sendMessage(ServerCallImpl.java:152) at io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:380) at com.thoughtspot.datamanager.server.DataManagerGrpcImpl.lambda$executeStreamQuery$0(DataManagerGrpcImpl.java:189) at com.thoughtspot.datamanger.query.execution.ResultParserUtil$1.tryAdvance(ResultParserUtil.java:143) at java.base/java.util.Spliterator.forEachRemaining(Spliterator.java:326) at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:658) at com.thoughtspot.datamanager.server.DataManagerGrpcImpl.executeStreamQuery(DataManagerGrpcImpl.java:175) at com.thoughtspot.datamanager.DataManagerServiceGrpc$MethodHandlers.invoke(DataManagerServiceGrpc.java:1965) at io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182) at io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35) at io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23) at io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40) at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:356) at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:861) at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829)

dennyjames avatar Nov 28 '25 14:11 dennyjames