opentelemetry-rust icon indicating copy to clipboard operation
opentelemetry-rust copied to clipboard

Missing and improperly nested span submission with Jaeger exporter

Open shivjm opened this issue 2 years ago • 8 comments

As outlined on Matrix, I’ve got a short-lived (periodically executed) application that uses reqwest, tracing-opentelemetry, and opentelemetry-jaeger. I noticed it frequently drops the main span when it exits, leaving the other spans scattered and flattened. I built a test repository that reproduces and explains the issue.

shivjm avatar Oct 03 '22 06:10 shivjm

It's not only related to the Jaeger exporter, but also to the OTLP (via GRPC) one.

saschagrunert avatar Oct 10 '22 11:10 saschagrunert

cc @jtescher as you have more experience in tracing-opentelemetry. I checked the example and can reproduce the issue with the stdout exporter. I suspect we didn't have the right context for the span created by HTTP client but still need to dig more to see what could be the issue

TommyCpp avatar Oct 16 '22 18:10 TommyCpp

@TommyCpp I did some digging on this problem and I think these issues are tied together to improper otel context propagation in tracing-opentelemetry. I forked the demo to show how to retain the parent span and log the span IDs.

  1. First issue of main not appearing can be solved by calling .with_context(span.context()) on run: image

    I noticed that I'd have to call both with_context and instrument to get tracing IDs to propagate: https://github.com/open-telemetry/opentelemetry-rust/pull/893

    This was the issue I initially stumbled upon because instrumentation does not handle on_enter and on_exit for tracing-opentelemetry, after some discussion on the tokio discord I took up a PR that is currently still in progress: https://github.com/tokio-rs/tracing/pull/2345

  2. Second issue is that current context span id is always one behind current span span id:

    Here's some things to notice on my fork of the example code:

    $ cargo run -- --log 'warn,rust_opentelemetry_jaeger_test=trace' --backend surf --jaeger-agent-endpoint 127.0.0.1:6831
    
     INFO Parsed arguments root=Root { url: "http://www.google.com/", jaeger_agent_endpoint: "127.0.0.1:6831", backend: "surf", json: false, log: "warn,rust_opentelemetry_jaeger_test=trace" }
     WARN pre_run span_span_id=9023d793d11614e5
     WARN pre_run cx_span_id=0000000000000000
     WARN main:run: span_span_id=5102f91ba0fbc9ca
     WARN main:run: cx_span_id=9023d793d11614e5
     WARN main:run:make_request_with_surf: make_request span_span_id=94739a4b3762018c
     WARN main:run:make_request_with_surf: make_request cx_span_id=5102f91ba0fbc9ca
     INFO main:run: Finished making request
     INFO Shut down tracer provider
    
  • Removing the instrument property on rust_opentelemetry_jaeger_test ::run produces aligned current span ids but only for one depth:

    - #[tracing::instrument(skip_all)]
    + // #[tracing::instrument(skip_all)]
    async fn run(root: &Root) {
    
     WARN pre_run span_span_id=fc91e806866aee3e
     WARN pre_run cx_span_id=0000000000000000
     WARN main: span_span_id=fc91e806866aee3e
     WARN main: cx_span_id=fc91e806866aee3e
     WARN main:make_request_with_surf: make_request span_span_id=cace6f002448669a
     WARN main:make_request_with_surf: make_request cx_span_id=fc91e806866aee3e
     INFO main: Finished making request
     INFO Shut down tracer provider
    

Ultimately the solution likely lies in toktio-telemetry with the proper handling of otel contexts for on_enter and on_exit event hooks.

mkatychev avatar Oct 16 '22 20:10 mkatychev

@mkatychev I implemented something like this in conmon-rs (https://github.com/containers/conmon-rs/pull/807), but the parent spans still seems missing from time to time.

Ref: https://github.com/tokio-rs/tracing-opentelemetry/issues/58

saschagrunert avatar Jan 26 '23 16:01 saschagrunert

@saschagrunert hope you made some progress. I haven't gotten around to dealing with the issue. Hopefully this stalled PR might provide some context: https://github.com/tokio-rs/tracing/pull/2345

mkatychev avatar Jun 05 '23 21:06 mkatychev

Hello all, can you provide a status update for this issue? Currently, the tracing integration is effectively broken from my understanding.

I am using the opentelemetry-zipkin exporter, so this issue is not only occurring with the Jaeger exporter.

AlexandreCassagne avatar Feb 16 '24 21:02 AlexandreCassagne

@AlexandreCassagne Unfortunately no. The tokio-tracing and otel integration requires some re-think/re-designs before they can fully interop! (https://github.com/open-telemetry/opentelemetry-rust/issues/1533#issuecomment-1946841743)

If you use OTel Tracing API throughout, and then something is broken, it is something we can fix in this repo. But the reality is - a lot of scenarios are already instrumented with tokio-tracing, and then interoping with OTel don't work nicely.

also related is https://github.com/open-telemetry/opentelemetry-rust/issues/1378

cijothomas avatar Feb 16 '24 22:02 cijothomas

@cijothomas That's disappointing, I integrated our tokio-tracing with opentelemetry specifically for exports to Jaeger/Zipkin. I hope that this will eventually be implemented better as distributed tracing is a very useful tool for our team.

Does this mean for those who wish to use opentelemetry, that we should integrate our tracing directly with it? or is there a middle way, such as a macro that can be used to add parent span contexts, etc? Can the OTel tracing api be used in parallel to provide links between spans?

AlexandreCassagne avatar Feb 19 '24 15:02 AlexandreCassagne