grpc-java
grpc-java copied to clipboard
How to differentiate an error produced on the server or client side
The problem is when an application acts as a proxy, we need to know how can be distinghised the errors side where were produced. We need this because of the proxy has a Circuit Breaker to compute the errors only of the client side, if something goes wrong on the server side we do not want to compute them.
It seems that if a cancellation is produced, the proxy can received the following exceptions, depending on the side:
status = {Status@16991} "Status{code=CANCELLED, description=RST_STREAM closed stream. HTTP/2 error code: CANCEL, cause=null}"
code = {Status$Code@17001} "CANCELLED"
description = "RST_STREAM closed stream. HTTP/2 error code: CANCEL"
cause = null
trailers = {Metadata@16992} "Metadata()"
fillInStackTrace = true
backtrace = {Object[5]@16993}
detailMessage = "CANCELLED: RST_STREAM closed stream. HTTP/2 error code: CANCEL"
cause = null
stackTrace = {StackTraceElement[69]@16998}
depth = 69
suppressedExceptions = {Collections$EmptyList@16996} size = 0
result = {StatusRuntimeException@13049} "io.grpc.StatusRuntimeException: CANCELLED: io.grpc.Context was cancelled without error"
status = {Status@16944} "Status{code=CANCELLED, description=io.grpc.Context was cancelled without error, cause=null}"
trailers = {Metadata@16945} "Metadata()"
fillInStackTrace = true
backtrace = {Object[5]@16946}
detailMessage = "CANCELLED: io.grpc.Context was cancelled without error"
cause = null
stackTrace = {StackTraceElement[70]@16951}
depth = 70
suppressedExceptions = {Collections$EmptyList@16949} size = 0
Can be distinghished with the description of the Status? The exception and the status code does not help to this goal.
Thxs.
I think I understand what you mean by "server side" errors: The proxy receives a call from the client, then makes a call to the target server that returns the error. But can you please clarify what you mean by a "client side" error. What component is producing this error? The client that calls the proxy? The proxy acting as a client to the target server?
Technically both of those errors would generated on client-side, but this one was caused by an error on server-side:
RST_STREAM closed stream. HTTP/2 error code: CANCEL
While this one was generated completely based on client-side events:
io.grpc.Context was cancelled without error
gRPC does not provide the "source" of the error. That is generally a very fragile concept, especially when proxies are involved.
I'm honestly surprised to see the RST_STREAM CANCEL error. I wouldn't expect that to be visible often. Do you know what situation is causing that to appear?
I don't really understand what problem you are trying to solve though. I saw that you talked about circuit breaker, but I don't really understand this "do not want to compute them" part. And then the actual examples are about cancellation which seems like it'd be a different discussion from circuit breaker.
We need this because of the proxy has a Circuit Breaker to compute the errors only of the client side, if something goes wrong on the server side we do not want to compute them.
I try to clarify the scenario with the following image:

The idea with the Circuit Breaker is only to count the errors produced in the proxy gRPC client, and dismiss the errors produced between the APP1 and the PROXY. The problem arrives when a cancelled exception arrives to the proxy, and we do not kwow which side had the error.
The CB is not important, the main goal here is if there is a way to identify if a cancellation is thrown, identify where the error came.
Thxs for your answers.
The problem arrives when a cancelled exception arrives to the proxy, and we do not kwow which side had the error.
In general, I think you should just treat CANCELLED differently and mostly ignore them. They really don't apply to what you're trying to detect with the circuit breaker.
But it also seems this goes back to this question of mine:
I'm honestly surprised to see the RST_STREAM CANCEL error. I wouldn't expect that to be visible often. Do you know what situation is causing that to appear?
In general, I think you should just treat CANCELLED differently and mostly ignore them. They really don't apply to what you're trying to detect with the circuit breaker.
It seems to me that a CANCELLED can be a common error, a deadline exceeded for example can be an example of it, the gRPC client that creates de Context (with a deadline value) will fail throwing a DEADLINE_EXCEEDED, but the subsequent chain of calls will finish with a CANCELLED. The problem here is that the CANCELLED cannot be originated only on the client side (description of the CANCELLED error "The operation was cancelled, typically by the caller"). That makes me doubt about the origin of the error, because I can't distinguish what microservice was the origin of the error.
But it also seems this goes back to this question of mine:
I'm honestly surprised to see the RST_STREAM CANCEL error. I wouldn't expect that to be visible often. Do you know what situation is causing that to appear?
That error was only an example, that was forced by me cancelling the Context on the server side, I did it because I wanted to know if I was able to distinguish the side that originates the error.
Thxs for your time.