tracing icon indicating copy to clipboard operation
tracing copied to clipboard

Allow configuring a formatter with extra fields to include

Open bjchambers opened this issue 4 years ago • 20 comments

Feature Request

Crates

tracing-subscriber

Motivation

See https://github.com/tokio-rs/tracing/issues/1481 and https://github.com/open-telemetry/opentelemetry-rust/issues/625#issuecomment-909787903.

For connecting logs and distributed traces it would be useful to include the distributed trace ID (for instance, from OpenTelemetry) in the logs written by the JSON formatter.

This doesn't clearly fit in either tracing or opentelemetry, since it is about connecting them. One idea that emerged was if there was a method to register additional fields (or a callback to get those additional fields) with the JSON formatter, then this could be implemented by the user as adding a callback to the JSON formatter that gets the OpenTelemetry trace ID and adds that as a field.

Proposal

Add something like:

tracing_subscriber::fmt()
        .json()
        .with_max_level(tracing::Level::TRACE)
        .with_current_span(false)
        .with_extra_fields(|span| span.context().span().span_context().trace_id())
        .init();

Specifically, the callback is given the tracing Span, and then is free to use methods from OpenTelemetry, such as context()

This seems like a useful extension mechanism for augmenting log information from spans (and perhaps with_current_span and with_span_list could even be built using that mechanism).

Alternatives

The other option would be to allow a layer to add fields to the current span for the purposes of logging. Then the OpenTelemetry layer could be registered before the log layer, and add the OpenTelemetry trace-id as a field. This might be easier to use (would happen automatically when the two layers were used together) but wouldn't allow for other use cases.

bjchambers avatar Sep 01 '21 21:09 bjchambers

I'm using the same setup described in #1481 and facing a similar challenge, and an additional motivation to the one described in this issue. I would also like to have the possibility to add some generic fields to all tracing events (so they appear on both spans AND logs) like a "stage=acc/test/prod" for structured logging. I'm currently exploring the possibility to write a custom layer with the necessary visitors to handle this, but I am pretty new to both rust and tracing and progress is slow...

excentris avatar Sep 07 '21 10:09 excentris

We are facing this as well. We use Datadog and are trying to connect our Datadog logs and traces with opentelemetry-rust.

From the Datadog docs: It’s necessary to manually patch your particular logging module or library with a processor that adds the aforementioned translated TraceId and SpanId as Log attributes marked dd.trace_id and dd.span_id, respectively.

hobietje avatar Nov 18 '21 00:11 hobietje

@hobietje I had a similar issue, and tried a few ways to add it programmatically but it's hard and at the end you end up creating a new crate like tracing-bunyan-formatter. For now maybe you can use a tool like vector.dev to add or remove a field or change the log structure with Vector, And it's from Datadog so it might have a good integration with it.

mojtab23 avatar Nov 18 '21 19:11 mojtab23

I'm happy to approve a pull request to add this functionality to tracing-subscriber, if anyone is interested in implementing it.

hawkw avatar Nov 18 '21 22:11 hawkw

It's not that difficult writing a custom FormatEvent for this:

trace_id_format.rs.txt

let fmt_layer = tracing_subscriber::fmt::layer()
    //.json()
    .event_format(TraceIdFormat);

andrden avatar Dec 06 '21 15:12 andrden

That seems like a good work around. Gave that a try, although it seems like the trace ID is missing in cases where it has been set by a propagator or when its on a separate thread. Trying to see if I can figure anything specific out, but wanted to mention in case that triggered any ideas.

Any concerns around the performance of looking up the trace ID on each log? Would it be possible (and if so better) to have the opentelemtry have an extension that adds a field?

bjchambers avatar Dec 07 '21 17:12 bjchambers

trace_id_format.rs.txt Yes, trace ID was missing for nested spans, this is a fixed version, using parent() to reach the root and find the trace ID there

andrden avatar Dec 07 '21 18:12 andrden

Have you tried this? I've gotten it to include some ID in the logs, but it doesn't match the ID being used in the actual traces reported via open telemetry. Not sure how to debug.

bjchambers avatar Dec 22 '21 19:12 bjchambers

@bjchambers it seems to work for me, the trace id is the right one.

RemiKalbe avatar Jan 14 '22 17:01 RemiKalbe

@RemiKalbe interesting. I'm noticing that it gets an ID, but it doesn't map to any traces that are stored. I'm trying to figure out what is causing the problem. I'm using async and tonic, with the actual trace ID being propagated from the client to the server. So it seems like it could be a few different possibilities:

  1. Something about async (could see this since I think some of the tracing is thread-specific)
  2. Something about tonic (not sure how this could happen)
  3. Something about the propagation of the trace ID from the RPC call to the RPC handler?

Are you using any of those (so I can rule them out while digging into this)?

bjchambers avatar Jan 14 '22 23:01 bjchambers

@bjchambers

  1. I'm using Tokio
  2. Also using tonic

Here is my code

let filter = EnvFilter::try_from_default_env()
    .or_else(|_| EnvFilter::try_new("info"))
    .unwrap();

let opentelemetry_endpoint = env::var("OPENTELEMETRY_ENDPOINT").expect("Missing env var 'OPENTELEMETRY_ENDPOINT''");

let exporter = opentelemetry_otlp::new_exporter()
    .tonic()
    .with_endpoint(opentelemetry_endpoint);

let config = opentelemetry::sdk::trace::config()
    .with_resource(opentelemetry::sdk::Resource::new(vec![
        opentelemetry::KeyValue::new("service.name", "media-server"),
        opentelemetry::KeyValue::new("service.namespace", "cloud-conference"),
]));

let tracer = opentelemetry_otlp::new_pipeline()
    .tracing()
    .with_exporter(exporter)
    .with_trace_config(config)
    .install_batch(opentelemetry::runtime::Tokio)
    .expect("Failed to create opentelemtry pipeline");

let telemetry = tracing_opentelemetry::layer().with_tracer(tracer);

let subscriber =                         
    tracing_subscriber::fmt::layer().json().with_current_span(true).event_format(TraceIdFormat);

tracing_subscriber::registry()
    .with(telemetry)
    .with(filter)
    .with(subscriber)
    .try_init()
    unwrap();

RemiKalbe avatar Jan 15 '22 12:01 RemiKalbe

@bjchambers

1. I'm using Tokio

2. Also using tonic

Here is my code

let filter = EnvFilter::try_from_default_env()
    .or_else(|_| EnvFilter::try_new("info"))
    .unwrap();

let opentelemetry_endpoint = env::var("OPENTELEMETRY_ENDPOINT").expect("Missing env var 'OPENTELEMETRY_ENDPOINT''");

let exporter = opentelemetry_otlp::new_exporter()
    .tonic()
    .with_endpoint(opentelemetry_endpoint);

let config = opentelemetry::sdk::trace::config()
    .with_resource(opentelemetry::sdk::Resource::new(vec![
        opentelemetry::KeyValue::new("service.name", "media-server"),
        opentelemetry::KeyValue::new("service.namespace", "cloud-conference"),
]));

let tracer = opentelemetry_otlp::new_pipeline()
    .tracing()
    .with_exporter(exporter)
    .with_trace_config(config)
    .install_batch(opentelemetry::runtime::Tokio)
    .expect("Failed to create opentelemtry pipeline");

let telemetry = tracing_opentelemetry::layer().with_tracer(tracer);

let subscriber =                         
    tracing_subscriber::fmt::layer().json().with_current_span(true).event_format(TraceIdFormat);

tracing_subscriber::registry()
    .with(telemetry)
    .with(filter)
    .with(subscriber)
    .try_init()
    unwrap();

Hmm. The only things I see different then (which I'll try to spend some time looking into):

  1. Possibly the use of tracing propagation to initialize the spans on the server.
  2. The use of json().with_current_span(true). I don't think this is the issue since I believe the event_format(TraceIdFormat) supersedes the JSON format configuration, but maybe it doesn't and somehow changes the behavior?

I'll try to look more deeply and see if I can post any more details.

bjchambers avatar Jan 17 '22 18:01 bjchambers

you tried this? I've gotten it to include some ID in the logs, but it doesn't match the ID being used in the actual tr

Hi I'm implementing with axum this tracing + log and i want to show the trace_id on logs... Used this code example, but this

image

on trace_id_format.rs dont get the current context... even when the opentracing_otlp is sending to the agent.

#[tokio::main]
async fn main() {
    if env::var("RUST_LOG").is_err() {
        env::set_var("RUST_LOG", "debug");
    }

    let opentelemetry_endpoint = env::var("OPENTELEMETRY_ENDPOINT").expect("Missing env var 'OPENTELEMETRY_ENDPOINT'");

    let exporter = opentelemetry_otlp::new_exporter()
      .tonic()
      .with_endpoint(opentelemetry_endpoint);

    let config = opentelemetry::sdk::trace::config()
      .with_resource(opentelemetry::sdk::Resource::new(vec![
          opentelemetry::KeyValue::new("service.name", "svc_a"),
          opentelemetry::KeyValue::new("service.namespace", "test"),
      ]));

    let tracer = opentelemetry_otlp::new_pipeline()
        .tracing()
        .with_exporter(exporter)
        .with_trace_config(config)
        .install_batch(opentelemetry::runtime::Tokio)
        .expect("Failed to create opentelemetry pipeline");

    let telemetry = tracing_opentelemetry::layer().with_tracer(tracer);
    let subscriber = tracing_subscriber::fmt::layer().json().with_current_span(true).event_format(TraceIdFormat);

    tracing_subscriber::registry()
        .with(telemetry)
        .with(tracing_subscriber::EnvFilter::new(
            env::var("RUST_LOG").expect("RUST_LOG env variable not set"),
        ))
        .with(subscriber)
        .try_init()
        .expect("Initialization of tracing failed");

    let app = Router::new()
        .layer(TraceLayer::new_for_http())
        .route("/greet", post(greet));

    let addr = SocketAddr::from(([0, 0, 0, 0], 3000));
    tracing::info!("listening on {}", addr);

    axum::Server::bind(&addr)
        .serve(app.into_make_service())
        .await
        .unwrap();

    global::shutdown_tracer_provider();
}

lfdominguez avatar May 20 '22 06:05 lfdominguez

You can't get the SpanBuilder directly but the OtelData from which you then can retrieve the SpanBuilder. With that you don't even need to recursively traverse the parent SpanRef. Here's a snippet how I solved it on my side:

use opentelemetry::trace::{SpanId, TraceContextExt};
use tracing::Subscriber;
use tracing_opentelemetry::OtelData;
use tracing_subscriber::registry::{LookupSpan, SpanRef};

pub struct TraceInfo {
    pub trace_id: String,
    pub span_id: String,
}

fn lookup_trace_info<S>(span_ref: &SpanRef<S>) -> Option<TraceInfo>
where
    S: Subscriber + for<'a> LookupSpan<'a>,
{
    span_ref.extensions().get::<OtelData>().map(|o| TraceInfo {
        trace_id: o.parent_cx.span().span_context().trace_id().to_string(),
        span_id: o.builder.span_id.unwrap_or(SpanId::INVALID).to_string(),
    })
}


//    ...

if let Some(ref span_ref) = ctx.lookup_current() {
    if let Some(trace_info) = lookup_trace_info(span_ref) {
        serializer.serialize_entry("span_id", &trace_info.span_id)?;
        serializer.serialize_entry("trace_id", &trace_info.trace_id)?;
    }
}

//   ...

nicmue avatar May 25 '22 08:05 nicmue

Works for span_id... but check how the trace_id is a lot of 000000... (this happen with the other loop method) image

lfdominguez avatar May 25 '22 12:05 lfdominguez

In my case I have a remote trace_id which I feed into my spans with:

span.set_parent(opentelemetry::Context::new().with_remote_span_context(
    opentelemetry::trace::SpanContext::new(...),
));

With this the OtelData.parent_ctx.span().span_context() will return this said SpanContext with the remote trace_id. If you don't rely on remote trace ids you need to get the trace_id just like the span_id and do the traversal through the SpanRef.parent() till you find a valid trace_id.

However I need to admit I don't know how/if remote trace ids are fed into spans by the approach you used (TraceLayer::new_for_http()).

nicmue avatar May 25 '22 14:05 nicmue

In my case I have a remote trace_id which I feed into my spans with:

span.set_parent(opentelemetry::Context::new().with_remote_span_context(
    opentelemetry::trace::SpanContext::new(...),
));

With this the OtelData.parent_ctx.span().span_context() will return this said SpanContext with the remote trace_id. If you don't rely on remote trace ids you need to get the trace_id just like the span_id and do the traversal through the SpanRef.parent() till you find a valid trace_id.

However I need to admit I don't know how/if remote trace ids are fed into spans by the approach you used (TraceLayer::new_for_http()).

Thanks, do you have any public code where i can review how do you use that???

lfdominguez avatar May 25 '22 14:05 lfdominguez

Thanks, do you have any public code where i can review how do you use that???

No sorry that's extracted and adapted from a closed source project. However I can create a more detailed gist if your problem persists.

nicmue avatar May 25 '22 14:05 nicmue

Thanks, do you have any public code where i can review how do you use that???

No sorry that's extracted and adapted from a closed source project. However I can create a more detailed gist if your problem persists.

I would appreciate it.

lfdominguez avatar May 25 '22 14:05 lfdominguez

Ah-hah, I think this helped me get it working too. I based my code on https://github.com/tokio-rs/tracing/issues/1531#issuecomment-1136971089.

I'm using Tonic with a gRPC trace propagation. The part that I was missing was that the set_parent is setting the OTEL parent, not the tracing parent. So once you've found the "OTEL SPAN", you still need to ask for it's parent to get a trace ID. Not sure if it would also work using the SpanBuilder, but seems to work for me by going "tracing Span -> OtelData -> Otel parent -> trace ID".

bjchambers avatar May 25 '22 16:05 bjchambers

I am using this solution as well and it is working for me. Thank you. When I apply the custom event_format it stops being able to apply flatten_event. I would like to be able to pull the fields on the event onto the root json log structure. Is there a way to implement the flatten_event method for TraceIdFormat I am having a hard time figuring it out.

Edit:

Was able to get this working by converting the SerializeFieldMap into a serde_json::Value:

for (name, value) in serde_json::to_value(event.field_map())
    .unwrap()
    .as_object()
    .unwrap()
    .iter()
{
    serializer.serialize_entry(name, value)?;
}

I suspect nested values will need some special handle but this is sufficient for my use case for now.

tinkrtailor avatar Feb 07 '23 14:02 tinkrtailor

a gist or a complete example would be appreciated

FarhadF avatar May 21 '23 21:05 FarhadF

a gist or a complete example would be appreciated

I just remembered about this thread and created a small example project to showcase how we achieve a stable trace_id in our json logs: https://github.com/nicmue/tracing-stable-trace-id-example. Sorry for the delay...

nicmue avatar Jun 24 '23 09:06 nicmue

But with this approach (fmt::layer().json().event_format(json::Json)) you are giving up on the original (from tracing_subscriber) Json and roll your own.

I'd prefer to use the original Json and just use the suggested with_extra_fields method.

Still interested in a PR?

hseeberger avatar Jul 19 '23 13:07 hseeberger

@hawkw:

Still interested in a PR?

hseeberger avatar Jul 19 '23 13:07 hseeberger

I have a working prototype for only adding the OTel trace ID here: #2664.

That is used in this example app: https://github.com/hseeberger/hello-tracing-rs/pull/19.

hseeberger avatar Jul 23 '23 13:07 hseeberger

FWIW, I tried a few of the methods in this issue and couldn't get any of them to work. My current setup is Axum + Tracing + OTEL crates. I have an OTEL collector set up locally that's exporting traces to Grafana Tempo and exporting logs (filelog) to Grafana Loki.

Code for my tracing setup looks something like below:

fn init_tracing() -> TracingGuards {
    let global_filter = tracing_subscriber::EnvFilter::try_from_default_env().unwrap_or_else(|_| {
        "my_crate=trace".into()
    });

    let tracer = opentelemetry_otlp::new_pipeline()
        .tracing()
        .with_exporter(opentelemetry_otlp::new_exporter().tonic().with_env())
        .install_batch(opentelemetry::runtime::Tokio)
        .expect("couldn't create OTLP tracer");
    let telemetry_layer = tracing_opentelemetry::layer().with_tracer(tracer);

    let file_appender = tracing_appender::rolling::daily("C:/logs", "events.log");
    let (file_appender, file_appender_guard) = tracing_appender::non_blocking(file_appender);
    let file_appender_layer = tracing_subscriber::fmt::layer()
        .json()
        .event_format(event_format::Json)
        .with_writer(file_appender);

    let stdout_layer = tracing_subscriber::fmt::layer()
        .compact()
        .with_writer(std::io::stdout);

    tracing_subscriber::registry()
        .with(global_filter)
        .with(telemetry_layer)
        .with(stdout_layer)
        .with(file_appender_layer)
        .init();

    TracingGuards {
        file_appender_guard,
    }
}

I found using the OtelData propagation method @nicmue mentioned would successfully push the trace_id and span_id into my logs, but then my OTEL traces would stop being sent to Grafana Tempo. I played around with it trying to get it to work, but had no success. I then tried the SpanBuilder method @andrden mentioned, but that didn't work either.

What did work was my altered version of their solutions, which seems less elegant and likely has problems, but appears to be correct with my testing:

    fn format_event(
        &self,
        _ctx: &FmtContext<'_, S, N>,
        mut writer: Writer<'_>,
        event: &Event<'_>,
    ) -> std::fmt::Result
    where
        S: Subscriber + for<'a> LookupSpan<'a>,
    {
        let meta = event.metadata();

        let mut visit = || {
            let mut serializer = serde_json::Serializer::new(WriteAdaptor::new(&mut writer));
            let mut serializer = serializer.serialize_map(None)?;
            serializer.serialize_entry("timestamp", &Utc::now().to_rfc3339())?;
            serializer.serialize_entry("level", &meta.level().as_serde())?;
            serializer.serialize_entry("fields", &event.field_map())?;
            serializer.serialize_entry("target", meta.target())?;

            let current_span = tracing::Span::current();
            let context = current_span.context();
            let span_ref = context.span();
            let span_context = span_ref.span_context();

            let trace_id = span_context.trace_id();
            if trace_id != TraceId::INVALID {
                serializer.serialize_entry("trace_id", &trace_id.to_string())?;

                let span_id = span_context.span_id();
                if span_id != SpanId::INVALID {
                    serializer.serialize_entry("span_id", &span_id.to_string())?;
                }
            }

            serializer.end()
        };

        visit().map_err(|_| std::fmt::Error)?;
        writeln!(writer)
    }

Regardless of this working, I'd also love to have a simpler way to alter the default Json formatter vs. creating my own, as the OP mentioned and @hseeberger created a PR for.

A collaboration with the OTEL team to have a straight forward method and example of propagating the ids to JSON logs would also be welcomed as I'm sure other people are naturally going to run into this problem.

jonkight avatar Oct 21 '23 05:10 jonkight