grpc-java
grpc-java copied to clipboard
Throwing an exception from some StreamTracer methods hangs the RPC
What version of gRPC-Java are you using?
Head
What is your environment?
Android
What did you expect to see?
If a client-implemented ClientStreamTracer method throws an exception, the RPC should fail.
What did you see instead?
The RPC never completes (blocks forever or the future never completes, depending on stub type).
Steps to reproduce the bug
I was able to reproduce this bug with both the InProcessChannel and Cronet transport implementations.
From grpc's GrpcServerRuleTest:
@Test
public void testTracer() throws Exception {
TestServiceImpl testService = new TestServiceImpl();
grpcServerRule1.getServiceRegistry().addService(testService);
SimpleServiceGrpc.SimpleServiceBlockingStub stub =
SimpleServiceGrpc.newBlockingStub(grpcServerRule1.getChannel());
SimpleRequest request = SimpleRequest.getDefaultInstance();
SimpleResponse resp =
stub.withInterceptors(
new ClientInterceptor() {
@Override
public <ReqT, RespT> ClientCall<ReqT, RespT> interceptCall(
MethodDescriptor<ReqT, RespT> method, CallOptions callOptions, Channel next) {
return next.newCall(
method,
callOptions.withStreamTracerFactory(
new ClientStreamTracer.Factory() {
@Override
public ClientStreamTracer newClientStreamTracer(
ClientStreamTracer.StreamInfo info, Metadata headers) {
return new ClientStreamTracer() {
@Override
public void streamClosed(Status status) {
throw new RuntimeException();
}
};
}
}));
}
})
.unaryRpc(request);
assertThat(resp).isEqualTo(SimpleResponse.getDefaultInstance());
}
Spoke with @groakley and he mentioned that @wanyingd1996 might be interested in taking this on. But apparently I can't directly assign them... (previous contributor in https://github.com/grpc/grpc-java/pull/7242)
I reproduced the bug using InProcessChannel, and the trace shows the problem is inside of notifyClientClose method in InProcessTransport.java. The streamClosed method is called by clientStream.statsTraceCtx.streamClosed(clientStatus);, and if this fails, notifyClientClose will fail, then client will never get notified to be closed. Same problem can be reproduced by throwing RuntTimeException from any methods that is called by notifyClientClose. Unless there is other way to notify client, It seems the only way to avoid hanging RPC in this situation is to catch and do not throw in these StreamTracer related methods.
At a minimum, we'd want to make sure that any client-implemented method that throws doesn't hang the stub or prevent resources from being released.
Looking at InProcessTransport, one way to achieve that might be to modify
private void notifyClientClose(Status status, Metadata trailers) {
Status clientStatus = cleanStatus(status, includeCauseWithStatus);
synchronized (this) {
if (closed) {
return;
}
if (clientReceiveQueue.isEmpty()) {
closed = true;
clientStream.statsTraceCtx.clientInboundTrailers(trailers);
clientStream.statsTraceCtx.streamClosed(clientStatus);
clientStreamListener.closed(clientStatus, trailers);
} else {
clientNotifyStatus = clientStatus;
clientNotifyTrailers = trailers;
}
}
streamClosed();
}
to be
private void notifyClientClose(Status status, Metadata trailers) {
Status clientStatus = cleanStatus(status, includeCauseWithStatus);
synchronized (this) {
if (closed) {
return;
}
if (clientReceiveQueue.isEmpty()) {
closed = true;
clientStatus = runUpdatingStatusOnFailure(clientStatus, () -> {clientStream.statsTraceCtx.clientInboundTrailers(trailers);});
clientStatus = runUpdatingStatusOnFailure(clientStatus, () -> {clientStream.statsTraceCtx.streamClosed(clientStatus);});
clientNotifyStatus = runUpdatingStatusOnFailure(clientStatus, () -> {clientStreamListener.closed(clientStatus, trailers);});
} else {
clientNotifyStatus = clientStatus;
clientNotifyTrailers = trailers;
}
}
streamClosed();
}
private Status runUpdatingStatusOnFailure(Status status, Runnable command) {
try {
command.run();
} catch (Throwable t) {
return cleanStatus(Status.fromThrowable(t), includeCauseWithStatus);
}
return status;
}
@ericgribkoff , do you think we should go with the fix suggested by @groakley ?
At a minimum, we'd want to make sure that any client-implemented method that throws doesn't hang the stub or prevent resources from being released.
Yes. We also will want to update the javadoc for StreamTracer to say that implementations must not throw.
Regarding the runUpdatingStatusOnFailure approach, that would fix the hanging but we can/should actually be more stringent here about dealing with an exception if one is thrown: we should at the least shutdown the transport if any StreamTracer throws (moving the entire channel into a panic state might be even better, but might be tricky to actually accomplish).
@wanyingd1996 Wrapping the calls to statsTraceCtx methods in try/catch blocks should work (I think bad things can happen if an exception is thrown by a stream tracer even outside of the notifyClientClose call?). There are 18 calls to statsTraceCtx methods in InProcessTransport, plus some more in AbstractClientStream and a few in Netty/OkHttp specific code, so that approach might be somewhat unwieldy...catching any exception somewhere higher up in the call stack (whether it comes from a stream tracer or a bug in grpc code itself) and responding by shutting down the transport or panic'ing the channel with a clear error message could be a cleaner approach to this, but I haven't looked into details to see where/how this could best be done.
@ericgribkoff I think catching the exception in StatsTraceContext instead of in notifyClientClose would be clear, but it seems I can't shutdown the transport in StatsTraceContext. I am fairly new to grpc, do you have any suggestions?
@wanyingd1996 Are you still interested in working on this issue?
@wanyingd1996 Are you still interested in working on this issue?
Hi, Larry, I'm not working on grpc anymore, and I don't have the bandwidth for the bug now. feel free to take this. Thanks!
Started analysis, going through the repro steps.
Here is my understanding of the issue. The suggestion is that the transport should be shutdown when any of the Stream tracer methods throw an exception, to prevent the subchannel from hanging. It appears that since StatsTraceContext wraps the calls to the StreamTracers the exception handling can be in this class. StatsTraceContext is used both on client side and server side, for the respective transports, and it calls stream tracer methods for collecting trace information about the respective transport. Only the ServerTransport has a shutdown method, not the ClientTransport, so the shutting down the transport only applies when a server side stream tracer throws an exception (the exception being for InProcessTransport). For other transports what should the StreamTracerContext do after catching an exception from a client StreamTracer? When creating StatsTraceContext on the server side from a Transport impl we could add the ServerTransport.this as an argument and store it as a field in StatsTraceContext, to call shutdown on it when any of the server side StreamTracer methods throw an exception. The example code given in this issue uses InProcessTransport and client StreamTracers. Since the process is common to client and server, the server close sequence itself calls the client stream tracers
streamClosed:170, StatsTraceContext (io.grpc.internal)
lambda$notifyClientClose$6:616, InProcessTransport$InProcessStream$InProcessServerStream (io.grpc.inprocess)
run:-1, InProcessTransport$InProcessStream$InProcessServerStream$$Lambda/0x00007f631f163248 (io.grpc.inprocess)
runUpdatingStatusOnFailure:633, InProcessTransport$InProcessStream$InProcessServerStream (io.grpc.inprocess)
notifyClientClose:615, InProcessTransport$InProcessStream$InProcessServerStream (io.grpc.inprocess)
close:601, InProcessTransport$InProcessStream$InProcessServerStream (io.grpc.inprocess)
closeInternal:227, ServerCallImpl (io.grpc.internal)
close:213, ServerCallImpl (io.grpc.internal)
onCompleted:395, ServerCalls$ServerCallStreamObserverImpl (io.grpc.stub)
unaryRpc:255, GrpcServerRuleTest$TestServiceImpl (io.grpc.testing)
invoke:410, SimpleServiceGrpc$MethodHandlers (io.grpc.testing.protobuf)
onHalfClose:182, ServerCalls$UnaryServerCallHandler$UnaryServerCallListener (io.grpc.stub)
halfClosed:356, ServerCallImpl$ServerStreamListenerImpl (io.grpc.internal)
runInContext:861, ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed (io.grpc.internal)
and only for InProcessTransport we would shutdown the transport when a client StreamTracer method throws. For InProcessTransport we may store the server transport in the StatsTraceContext created for the client side also.
@ericgribkoff please confirm.