opentelemetry-rust
opentelemetry-rust copied to clipboard
Missing and improperly nested span submission with Jaeger exporter
As outlined on Matrix, I’ve got a short-lived (periodically executed) application that uses reqwest, tracing-opentelemetry, and opentelemetry-jaeger. I noticed it frequently drops the main span when it exits, leaving the other spans scattered and flattened. I built a test repository that reproduces and explains the issue.
It's not only related to the Jaeger exporter, but also to the OTLP (via GRPC) one.
cc @jtescher as you have more experience in tracing-opentelemetry. I checked the example and can reproduce the issue with the stdout exporter. I suspect we didn't have the right context for the span created by HTTP client but still need to dig more to see what could be the issue
@TommyCpp I did some digging on this problem and I think these issues are tied together to improper otel context propagation in tracing-opentelemetry
. I forked the demo to show how to retain the parent span and log the span IDs.
-
First issue of main not appearing can be solved by calling
.with_context(span.context())
on run:I noticed that I'd have to call both
with_context
andinstrument
to get tracing IDs to propagate: https://github.com/open-telemetry/opentelemetry-rust/pull/893This was the issue I initially stumbled upon because instrumentation does not handle
on_enter
andon_exit
fortracing-opentelemetry
, after some discussion on the tokio discord I took up a PR that is currently still in progress: https://github.com/tokio-rs/tracing/pull/2345 -
Second issue is that current context span id is always one behind current span span id:
Here's some things to notice on my fork of the example code:
$ cargo run -- --log 'warn,rust_opentelemetry_jaeger_test=trace' --backend surf --jaeger-agent-endpoint 127.0.0.1:6831 INFO Parsed arguments root=Root { url: "http://www.google.com/", jaeger_agent_endpoint: "127.0.0.1:6831", backend: "surf", json: false, log: "warn,rust_opentelemetry_jaeger_test=trace" } WARN pre_run span_span_id=9023d793d11614e5 WARN pre_run cx_span_id=0000000000000000 WARN main:run: span_span_id=5102f91ba0fbc9ca WARN main:run: cx_span_id=9023d793d11614e5 WARN main:run:make_request_with_surf: make_request span_span_id=94739a4b3762018c WARN main:run:make_request_with_surf: make_request cx_span_id=5102f91ba0fbc9ca INFO main:run: Finished making request INFO Shut down tracer provider
-
Removing the instrument property on
rust_opentelemetry_jaeger_test ::run
produces aligned current span ids but only for one depth:- #[tracing::instrument(skip_all)] + // #[tracing::instrument(skip_all)] async fn run(root: &Root) {
WARN pre_run span_span_id=fc91e806866aee3e WARN pre_run cx_span_id=0000000000000000 WARN main: span_span_id=fc91e806866aee3e WARN main: cx_span_id=fc91e806866aee3e WARN main:make_request_with_surf: make_request span_span_id=cace6f002448669a WARN main:make_request_with_surf: make_request cx_span_id=fc91e806866aee3e INFO main: Finished making request INFO Shut down tracer provider
Ultimately the solution likely lies in toktio-telemetry
with the proper handling of otel contexts for on_enter
and on_exit
event hooks.
@mkatychev I implemented something like this in conmon-rs (https://github.com/containers/conmon-rs/pull/807), but the parent spans still seems missing from time to time.
Ref: https://github.com/tokio-rs/tracing-opentelemetry/issues/58
@saschagrunert hope you made some progress. I haven't gotten around to dealing with the issue. Hopefully this stalled PR might provide some context: https://github.com/tokio-rs/tracing/pull/2345
Hello all, can you provide a status update for this issue? Currently, the tracing integration is effectively broken from my understanding.
I am using the opentelemetry-zipkin exporter, so this issue is not only occurring with the Jaeger exporter.
@AlexandreCassagne Unfortunately no. The tokio-tracing and otel integration requires some re-think/re-designs before they can fully interop! (https://github.com/open-telemetry/opentelemetry-rust/issues/1533#issuecomment-1946841743)
If you use OTel Tracing API throughout, and then something is broken, it is something we can fix in this repo. But the reality is - a lot of scenarios are already instrumented with tokio-tracing, and then interoping with OTel don't work nicely.
also related is https://github.com/open-telemetry/opentelemetry-rust/issues/1378
@cijothomas That's disappointing, I integrated our tokio-tracing with opentelemetry specifically for exports to Jaeger/Zipkin. I hope that this will eventually be implemented better as distributed tracing is a very useful tool for our team.
Does this mean for those who wish to use opentelemetry, that we should integrate our tracing directly with it? or is there a middle way, such as a macro that can be used to add parent span contexts, etc? Can the OTel tracing api be used in parallel to provide links between spans?