grpclib
grpclib copied to clipboard
Sample for implementing opentelemetry tracing
It would be great to be able to trace our gRPC servers and clients using opentelemetry. How would I tie it into this library?
I would like to have each message handler be called in a context (or passed one) that I can create further child spans from and attach logs to.
Since 0.4.2rc2 there are status-related properties in the SendTrailingMetadata and RecvTrailingMetadata events: https://grpclib.readthedocs.io/en/latest/events.html#grpclib.events.RecvTrailingMetadata
So now it is possible to add tracing by using events system.
I’ve started doing this but I have to use contextvars and associate the listeners whenever I start a server; rather than having it in the config of the library
Could you possibly provide an example that gets the unary/stream combinations right?
associate the listeners whenever I start a server; rather than having it in the config of the library
You can (1) create server instance and attach listeners before (2) starting the server. (1) and (2) can be done in different places. This is almost the same as providing interceptors to other libraries when you create a server:
https://github.com/open-telemetry/opentelemetry-python-contrib/blob/d9c01168716481abac185cc9d5c71462b5722179/instrumentation/opentelemetry-instrumentation-grpc/tests/test_server_interceptor.py#L290-L297
Could you possibly provide an example that gets the unary/stream combinations right?
Can you elaborate? What do you mean by "unary/stream combinations right"?
You can also implement this using monkey patching approach like this:
https://github.com/open-telemetry/opentelemetry-python-contrib/blob/d9c01168716481abac185cc9d5c71462b5722179/instrumentation/opentelemetry-instrumentation-grpc/tests/test_server_interceptor.py#L83-L84
I don't see any limitations in grpclib.
Can you elaborate? What do you mean by "unary/stream combinations right"?
Yes, for example, a stream-req-stream-resp would have to create PRODUCE/CONSUME spans rather than CLIENT/SERVER spans, and if one side closes the connection, there might be spans that might be "associated with" a trace, after the fact rather than being sent from the client during runtime (having an explicit parent). Especially so if you're also creating traces for the lifetimes of objects in your app (e.g. I start a new trace when I start the server).
I'm also trying to come to terms with how to manage tracing of python asyncio tasks (again, associated with-type spans?)
There's also the matter of reading request data from the metadata of requests (caller started trace, provides a SpanContext to the server); a sample of how that should be done in gRPC would be a nice addition (and I'll get there, but I haven't investigated this path fully yet).
As for trailing metadata, I haven't been able to find a good resource on this? I've read in the code-docs that it's what is sent as an "ending" to streams? Or can these be sent multiple times during a request/response interaction?
As for the events, I haven't been able to figure out how to capture exceptions using the eventing system? E.g. it's not just about providing a span to the Handler, but also to capture exceptions from it. I've resorted to an explicit get() in the function body of the Handler, because then I can capture stacktraces. Getting an example of how to manage these sort of errors would be nice, including "bad request" thrown from the Handler as a gRPC message.
It would also be interesting to hear if specifically contextvars are the recommended solution? I read you said so in a previous answer to an issue?
I'm not much for monkeypatching, coming from statically typed languages... It's often more complex to debug and relies on implementation details rather than API:s.
Adapted opentelemetry-instrumentation-grpc to the helloworld example: https://gist.github.com/vmagamedov/19a29f7a4f8f70d76bbc797a0e994112
Should be enough to understand how to extract request metadata, exceptions, status etc.
This example is just a POC, I still don't understand how attach(extract(event.metadata)) works :) I don't see request metadata in the console (ConsoleSpanExporter), can't test that this actually works.
It would also be interesting to hear if specifically contextvars are the recommended solution? I read you said so in a previous answer to an issue?
Yes, this is how context propagation in Python works. Everyone under the hood use contextvars.
Hi @vmagamedov , is this gist still up to date with the best way of doing tracing? :)
For example, isn't this more ideomatic?
It is definitely not the best way of doing tracing, just a quick proof of concept