grpc-kotlin icon indicating copy to clipboard operation
grpc-kotlin copied to clipboard

Coroutines elides useful stack information on io.grpc.StatusException

Open Groostav opened this issue 4 years ago • 4 comments
trafficstars

I was using GRPC with a whole bunch of custom kotlin wrappers, then I discovered your nice code generator and switched. I must confess I didnt do a lot of testing, and now that im into some more developing I find myself very annoyed with the kotlin code generations exception handling.

When the invocation of a service method results in an onError() being executed, the exception thrown from the suspending function looks like this:

Exception in thread "DefaultDispatcher-worker-2" io.grpc.StatusException: UNAVAILABLE: io exception
        at io.grpc.Status.asException(Status.java:549)
        at io.grpc.kotlin.ClientCalls$rpcImpl$1$1$1.onClose(ClientCalls.kt:295)
        at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:426)
        at io.grpc.internal.ClientCallImpl.access$500(ClientCallImpl.java:66)
        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$900(ClientCallImpl.java:577)
        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:751)
        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:740)
        at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
        at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:830)

Which stub method was called?! Where was it called in my code?

thus in bug format:

when: calling a method where the server side will respond with onError() expected: an exception containing a trace that points to the problem stub method invocation actual: useless stack trace.


The root of this problem is the stackless coroutine implementation that is kotlin coroutines, Still:

  • for my use case, eagerly generating an exception and only throwing it if an error occurs would be fine.
    • This could be accomplished with some kind of custom grpc EagerGenerateStackTrace CoroutineContext element, meaning the default behaviour would be left as is, with the relatively minimal overhead of an if(coroutineContext[EagerGenerateStackTrace] != null) check
    • Also, new Exception() will do a ~fast native copy of the stack. Only when the caller calls getStackTrace() will the StackTraceElement[] be allocated, which is the slowest portion of exceptions.
    • the performance impact of a new Exception() call can also be hidden by doing it after the message is put on the net stack, making the time spent moot so long as the server replies in more time than is taken to call new Exception()
  • with all the metadata available to asException(), its annoying that there isn't some plain text in the error message telling me which service method blew up.

Groostav avatar Oct 26 '21 20:10 Groostav

I could put this together as a pull request if there is any interest. This shouldn't break either source compatibility or binary compatibility, but my solution, that is, embedding information in CoroutineContext, is pretty magical. Using the hidden variable passed around suspend-fun calls, even when using a nominally safe approach, is not exactly idiomatic java.

A System property could be used instead and would probably fall into more java-ish norms.

Groostav avatar Oct 26 '21 21:10 Groostav

I just ran into this problem myself -- I have a GRPC method that's failing and the stack trace in the exception handler is not reporting anything about the actual call site that issued the request. Do you happen to have a version of this that is available open-source anywhere?

cmeiklejohn avatar Apr 19 '22 02:04 cmeiklejohn

@cmeiklejohn A little reproducible example project or failing test would be super helpful for this.

jamesward avatar Apr 19 '22 12:04 jamesward

@jamesward I can try to put together an example, but I can't share the code I'm working on because it's obviously part of a much larger, not open source project. However, in our case, it's very simple: a GRPC interceptor on the client has a bug and throws an exception and the stack trace doesn't contain information about which RPC was executing.

cmeiklejohn avatar Apr 19 '22 15:04 cmeiklejohn

I've linked a PR that has a minimal test and a (at least partial) fix for this issue

zakhenry avatar Aug 22 '23 00:08 zakhenry

Thanks! I'll get this into a 1.3.1 release shorty - just waiting to see if we want to get anything else into the release.

jamesward avatar Aug 22 '23 15:08 jamesward

@zakhenry love it

cmeiklejohn avatar Aug 24 '23 16:08 cmeiklejohn

I've released v1.3.1 with this fix. Thank you @zakhenry! Let us know if there are any issues.

jamesward avatar Aug 25 '23 17:08 jamesward