Random io.grpc.StatusRuntimeException: CANCELLED exceptions in Server-Side Streaming RPC
gRPC Version : 1.38.0 Java Version : openjdk version "11.0.27" 2025-04-15 LTS
This issue is random and is not reproducible in a test environment. Even on the environments were this is seen, issue happens randomly.
I have a server side streaming gRPC method, That returns a Iterator to the client. While looping on the iterator, below error occurs on the client side.
6066f128-75d7-42a2-af96-40c98bca2137 orgId=649529784 user=55fcc549-5f6a-42d7-85a0-2786f11a580b null
io.grpc.StatusRuntimeException: CANCELLED: Server sendMessage() failed with Error
at io.grpc.Status.asRuntimeException(Status.java:532)
at io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:662)
at com.***.data.result.BatchedSqlResultSet.next(BatchedSqlResultSet.java:89)
at com.***.common.sql.SqlParser.getFalconResultSet(SqlParser.java:95)
at com.***.data.manager.exporter.DataExporterImpl.exportTableVisualization(DataExporterImpl.java:1064)
at com.***.data.manager.exporter.DataExporterImpl.exportTableVisualization(DataExporterImpl.java:927)
at com.***.data.manager.exporter.DataExporterImpl.exportDataForVisualization(DataExporterImpl.java:640)
at com.***.data.manager.exporter.DataExporterImpl.exportDataForVisualizations(DataExporterImpl.java:538)
On the server side, I see the below error
java.lang.IllegalStateException: call already closed
at com.google.common.base.Preconditions.checkState(Preconditions.java:502)
at io.grpc.internal.ServerCallImpl.closeInternal(ServerCallImpl.java:218)
at io.grpc.internal.ServerCallImpl.close(ServerCallImpl.java:211)
at io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onCompleted(ServerCalls.java:395)
at com.***.server.DataManagerGrpcImpl.executeStreamQuery(DataManagerGrpcImpl.java:177)
The fact that the error just mentions CANCELLED, without a detailed error makes it hard to understand the error.
It looks like your service impl is unaware of the call being already cancelled from the server side (which could happen due to server side call timeout for example) but it still tries to write to the response stream. If your service impl uses async processing, the ServerCall's onCancel will indicate to the ServerCallStreamObserver's cancellation handler so you can stop writing to the stream. If your service impl blocks the calling thread, then ServerCall's onCancel will be called only after the onMessage returns and in this case you need to either use Context (polling or Listener) or ServerCallStreamObserver.isCancelled() (polling-only) to learn of the cancellation.
Server sendMessage() failed with Error is pretty relevant. The IllegalStateException is unfortunate, and there may be something we could improve to make this error handling cleaner, but an Error just occurred, not an Exception. The most likely thing is your server got an OutOfMemoryError while trying to send a message. I suggest looking at the server logs for that or another Error.
We can't send more detailed error information to the client because we mustn't expose such detailed server application details with an arbitrary client.
@ejona86 We don't have a OutOfMemoryError on the server.
We are now seeing this error randomly even in Unary RPC calls. For Unary calls, we don't seejava.lang.IllegalStateException: call already closed on the server, I guess that is expected as there isn't a stream. A concrete pattern that we are observing whenever this error happens is the presence of a System.gc() event on the server's gc logs.
[398834.019s][info ][gc,start ] GC(480) Pause Young (System.gc())
[398834.032s][info ][gc,heap ] GC(480) PSYoungGen: 338650K->3648K(1212416K)
[398834.032s][info ][gc,heap ] GC(480) ParOldGen: 1809857K->1809857K(2162176K)
[398834.032s][info ][gc,metaspace ] GC(480) Metaspace: 115970K(118104K)->115970K(118104K) NonClass: 104893K(106456K)->104893K(106456K) Class: 11076K(11648K)->11076K(11648K)
[398834.032s][info ][gc ] GC(480) Pause Young (System.gc()) 2098M->1771M(3295M) 12.162ms
[398834.032s][info ][gc,cpu ] GC(480) User=0.03s Sys=0.00s Real=0.01s
[398834.032s][info ][gc,start ] GC(481) Pause Full (System.gc())
We are unaware of the trigger of this System.gc(), we don't explicitly do that from application code. To experiment a little we set the javaopt XX +DisableExplicitGC, to turn of explicit System.gc(). After this, we haven't seen this error so far. We are still monitoring, and I will update here later.
There is absolutely some Error. If it isn't OutOfMemoryError, fine, but it is definitely something. I was just saying that as the most likely.
Not much to do without knowing the Error being seen. Closing.
Hello, I would like to reopen this issue as I found logs of OutOfMemory.
Exception in thread "grpc-default-executor-2739" java.lang.OutOfMemoryError: Direct buffer memory at java.base/java.nio.Bits.reserveMemory(Bits.java:175) at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:118) at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:317) at io.grpc.netty.shaded.io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:717) at io.grpc.netty.shaded.io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:692) at io.grpc.netty.shaded.io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:215) at io.grpc.netty.shaded.io.netty.buffer.PoolArena.tcacheAllocateNormal(PoolArena.java:197) at io.grpc.netty.shaded.io.netty.buffer.PoolArena.allocate(PoolArena.java:139) at io.grpc.netty.shaded.io.netty.buffer.PoolArena.allocate(PoolArena.java:129) at io.grpc.netty.shaded.io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:395) at io.grpc.netty.shaded.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:188) at io.grpc.netty.shaded.io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:124) at io.grpc.netty.shaded.io.grpc.netty.NettyWritableBufferAllocator.allocate(NettyWritableBufferAllocator.java:51) at io.grpc.internal.MessageFramer.writeKnownLengthUncompressed(MessageFramer.java:226) at io.grpc.internal.MessageFramer.writeUncompressed(MessageFramer.java:172) at io.grpc.internal.MessageFramer.writePayload(MessageFramer.java:143) at io.grpc.internal.AbstractStream.writeMessage(AbstractStream.java:66) at io.grpc.internal.ServerCallImpl.sendMessageInternal(ServerCallImpl.java:168) at io.grpc.internal.ServerCallImpl.sendMessage(ServerCallImpl.java:152) at io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:380) at com.thoughtspot.datamanager.server.DataManagerGrpcImpl.lambda$executeStreamQuery$0(DataManagerGrpcImpl.java:189) at com.thoughtspot.datamanger.query.execution.ResultParserUtil$1.tryAdvance(ResultParserUtil.java:143) at java.base/java.util.Spliterator.forEachRemaining(Spliterator.java:326) at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:658) at com.thoughtspot.datamanager.server.DataManagerGrpcImpl.executeStreamQuery(DataManagerGrpcImpl.java:175) at com.thoughtspot.datamanager.DataManagerServiceGrpc$MethodHandlers.invoke(DataManagerServiceGrpc.java:1965) at io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182) at io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35) at io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23) at io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40) at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:356) at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:861) at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829)