opentelemetry-rust icon indicating copy to clipboard operation
opentelemetry-rust copied to clipboard

Infinite trace generator

Open hamiltop opened this issue 3 years ago • 12 comments
trafficstars

When I use the opentelemetry_datadog library with reqwest as the client, each batch of spans will generate an additional 2 spans. These spans look like this:

[[{'service': 'test', 'name': 'opentelemetry-datadog', 'resource': 'encode_headers', 'trace_id': 9313706354471130378, 'span_id': 2181458161362850175, 'parent_id': 0, 'start': 1647298645595811251, 'duration': 156890, 'error': 0, 'meta': {'busy_ns': '66163', 'code.namespace': 'hyper::proto::h1::role', 'code.filepath': '/home/hamiltop/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.17/src/proto/h1/role.rs', 'code.lineno': '1061', 'idle_ns': '90678'}}], [{'service': 'test', 'name': 'opentelemetry-datadog', 'resource': 'parse_headers', 'trace_id': 11785853278508097401, 'span_id': 9996945950193117364, 'parent_id': 0, 'start': 1647298645596617264, 'duration': 159966, 'error': 0, 'meta': {'code.namespace': 'hyper::proto::h1::role', 'code.lineno': '1183', 'busy_ns': '119852', 'code.filepath': '/home/hamiltop/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.17/src/proto/h1/role.rs', 'idle_ns': '36137'}}]]

I believe this is from the http request to datadog to save the original spans. Now, the request to send these 2 spans also generates 2 new spans, which generate 2 new spans, etc.

To reproduce:

  1. Run this script as a mock datadog agent: https://gist.github.com/kongo2002/a4d560dca89e24bf27c8d59dc63f4b1e This will echo all incoming traces to stdout for easy inspecting.
  2. Run this code:
use opentelemetry_datadog::{new_pipeline, ApiVersion};
use tracing::trace_span;
use tracing_subscriber::layer::SubscriberExt;

#[tokio::main]
async fn main() {
    let _tracer = new_pipeline()
        .with_agent_endpoint("http://localhost:8126")
        .with_service_name("test")
        .with_version(ApiVersion::Version03)
        .install_batch(opentelemetry::runtime::TokioCurrentThread)
        .unwrap();

    let telemetry = tracing_opentelemetry::layer().with_tracer(_tracer);

    let subscriber = tracing_subscriber::Registry::default().with(telemetry);

    tracing::subscriber::set_global_default(subscriber)
        .expect("Failed to install `tracing` subscriber.");

    trace_span!({ "test" }, "test span").in_scope(|| {
        println!("Hello, world!");
    });

    tokio::time::sleep(tokio::time::Duration::from_secs(60)).await;

    opentelemetry::global::shutdown_tracer_provider();
}
  1. Watch output in script (1) to see the encode_headers/parse_headers traces get generated over and over again.

I would expect to have some way to disable tracing of the tracer. Is this a bug? Do I just need to configure trace levels more precisely?

hamiltop avatar Mar 14 '22 23:03 hamiltop

I think the answer here is that you're seeing a lot of spam in the output because you haven't included any filtering in the registry stack. The default behavior is to include everything, even items that are tagged with trace (here meaning the logging level that's one step more verbose than debug, not to be confused with service traces).

In this case the hyper crate being used for transport itself uses the tracing facade and can produce events. This might be equivalent to configuring Java logging for the Apache HTTP package to spit out logs for all headers.

A typical application will have some filtering mechanism baked into the registry and would normally pick INFO or WARN as the default logging level rather than TRACE, and only increase logging as needed for specific packages. See below for an example which would respect the RUST_LOG environment variable as specified by the logging crate, or if none exists default to excluding all output below the level of INFO

    let subscriber = tracing_subscriber::Registry::default().with(telemetry)
      .with(tracing_subscriber::EnvFilter::try_from_default_env().unwrap_or_else(|_| EnvFilter::new("info")))

jherico avatar May 26 '22 22:05 jherico

Just found this issue, before i opened https://github.com/open-telemetry/opentelemetry-rust/issues/1171. Same solution should work for both, I think..

cijothomas avatar Aug 07 '23 23:08 cijothomas

Sounds like this might be related to https://github.com/open-telemetry/opentelemetry-rust/issues/473

hdost avatar Feb 21 '24 10:02 hdost

There was some attempt made to fix this as part of #1330, if someone wants to have a look further. Otherwise, I will get back to this issue shortly. The fix was to suppress the logs invoked in the context of the (async) task doing the export.

lalitb avatar Mar 04 '24 17:03 lalitb

@jherico, filtering is one way (in fact, what I'm having to do right now, eg: h2=off to stop this problem occurring), but that means I miss out on h2 traces from my application.

Hopefully something like #1330 can fix this properly?

NickLarsenNZ avatar Apr 20 '24 20:04 NickLarsenNZ