google-cloud-go icon indicating copy to clipboard operation
google-cloud-go copied to clipboard

logging: OpenTelemetry spans are ignored as parents for LogRecords

Open dashpole opened this issue 1 year ago • 15 comments

Client

logging: https://github.com/googleapis/google-cloud-go/tree/main/logging

Current behavior

The current implementation of populateTraceInfo always uses a remote span context as the parent https://github.com/googleapis/google-cloud-go/blob/ac16144215f844294c29063776f2f116e14ecebb/logging/logging.go#L807-L838

By "Remote span context", I mean a trace ID and span ID that originate from outside the application. The code above always uses a remote span context because it explicitly parses the trace id and span id from HTTP headers.

When a user has instrumented their application with OpenTelemetry, the trace ID and span ID have already been parsed from the HTTP headers, and used to start a span. If they are using the otelhttp library, it:

  1. extracts the context from req.Header into a golang context.Context
  2. uses the context to start a span
  3. updates the context.Context on the req using req.WithContext

Expected behavior

If the user is using opentelemetry instrumentation, we want to use the traceID and spanID of the span that OpenTelemetry created. This way, when a user looks for logs associated with that span, the logs produced by this logging library will appear. The traceID and spanID are easy to get since otelhttp stores them in req.Context().

We can do something close to this:


func populateTraceInfo(e *Entry, req *http.Request) bool {
	if req == nil {
		if e.HTTPRequest != nil && e.HTTPRequest.Request != nil {
			req = e.HTTPRequest.Request
		} else {
			return false
		}
	}
        // New handing code
        otelSpanContext := trace.SpanContextFromContext(req.Context())
	if otelSpanContext.IsValid() {
	        e.Trace := otelSpanContext.TraceID()
	        e.SpanID := otelSpanContext.SpanID()
                e.TraceSampled = e.TraceSampled | otelSpanContext.IsSampled()
                return true
	}
        // end of new code
	header := req.Header.Get("Traceparent")
         if header != "" { 
 		// do not use traceSampled flag defined by traceparent because 
 		// flag's definition differs from expected by Cloud Tracing 
 		traceID, spanID, _ := deconstructTraceParent(header) 
 		if traceID != "" { 
 			e.Trace = traceID 
 			e.SpanID = spanID 
 			return true 
 		} 
 	} 
 	header = req.Header.Get("X-Cloud-Trace-Context") 
 	if header != "" { 
 		traceID, spanID, traceSampled := deconstructXCloudTraceContext(header) 
 		if traceID != "" { 
 			e.Trace = traceID 
 			e.SpanID = spanID 
 			// enforce sampling if required 
 			e.TraceSampled = e.TraceSampled || traceSampled 
 			return true 
 		} 
 	} 
 	return false 
 } 

To test this, we should make sure an http request with a context from a span creation ends up in the LogEntry:

    req := &http.Request{}
    ctx := context.Background()
    ctx, _ = trace.NewNoopTracerProvider().Tracer("test instrumentation").Start(ctx, "span name")
    req = req.WithContext(ctx)
    e := &Entry{}
    got := populateTraceInfo(e, req)
    assert.True(t, got)
    assert.Len(t, e.Trace, 32)
    assert.Len(t, e.SpanID, 16)
    // TODO test sampled somehow

dashpole avatar Jan 25 '24 18:01 dashpole

Sorry for the label churn, I misunderstood where this function was.

quartzmo avatar Jan 26 '24 19:01 quartzmo

@dashpole how can we get this to work both in the case where the user is using otel libraries and the case where they're not using otel libraries?

gkevinzheng avatar Jan 30 '24 16:01 gkevinzheng

I'm suggesting you add a case that handles otel libraries in addition to the current parsing for traceparent and XCTC. If they aren't using OTel libraries, it should fall back to the existing logic.

Edit: I've updated the sample code above to show the full function, with the addition of OTel-handling logic.

dashpole avatar Jan 30 '24 16:01 dashpole

Ahh, ok. Are there any other libraries customers are using besides this one that we might have to support in the future?

gkevinzheng avatar Jan 30 '24 16:01 gkevinzheng

There are some projects that predated OpenTelemetry, such as OpenCensus or OpenTracing that had similar functionality, but both projects are archived in favor of OpenTelemetry. For Go, OpenTelemetry is really the only in-process tracing instrumentation that has broad adoption.

dashpole avatar Jan 30 '24 16:01 dashpole

Hi @dashpole , do you know if this logic works the same for other language version of otelhttp?

When a user has instrumented their application with OpenTelemetry, the trace ID and span ID have already been parsed from the HTTP headers, and used to start a span. If they are using the otelhttp library, it:

extracts the context from req.Header into a golang context.Context uses the context to start a span updates the context.Context on the req using req.WithContext

Can we assume the suggested fix will be similar for nodejs, java and python?

cindy-peng avatar Jan 30 '24 17:01 cindy-peng

Yes, although context implementations differ between languages. @aabmass might be able to help with the other languages if you have questions.

dashpole avatar Jan 30 '24 17:01 dashpole

It sounds like there is some unexpected behavior re:tracing contexts used so I added the type: bug label as part of issue triage, feel free to change that, just trying to organize things.

noahdietz avatar Jan 30 '24 19:01 noahdietz

Hi @aabmass , I am currently working on similar fixes for this issue in nodejs-logging and java-logging repos. I am looking for similar approaches as following to to get SpanContext in node and java to propagate the correct traceId/span Ids:

        // New handing code
        otelSpanContext := trace.SpanContextFromContext(req.Context())
	if otelSpanContext.IsValid() {
	        e.Trace := otelSpanContext.TraceID()
	        e.SpanID := otelSpanContext.SpanID()
                e.TraceSampled = e.TraceSampled | otelSpanContext.IsSampled()
                return true
	}
        // end of new code

Seems like there are quite a few OTEL instrumentation libraries. Could you point me to the right OTEL nodejs and java libraries/dependencies that I can use to retrieve the span context?

Here are where the traceId handled in logging repos currently:

  1. nodejs-logging: https://github.com/googleapis/nodejs-logging/blob/main/src/entry.ts#L235
  2. Java-logging: https://github.com/googleapis/java-logging/blob/main/google-cloud-logging/src/main/java/com/google/cloud/logging/LoggingImpl.java#L841

cindy-peng avatar Feb 06 '24 15:02 cindy-peng

@aabmass For Python integrations like Django/Flask, should we use the otel library for Python, or try to integrate the otel Django/Flask integrations with our logging library?

gkevinzheng avatar Feb 06 '24 16:02 gkevinzheng

@cindy-peng @gkevinzheng You should just need to use the trace API library for each language, rather than a specific instrumentation library. I was just trying to illustrate how they would work together, but the implementation I wrote above for Go will work no matter what library they are using.

I think for JS, you can get the current span context like this: https://github.com/open-telemetry/opentelemetry-js/blob/f86251d40fbf615be87319c8a1f5643afb820076/doc/propagation.md?plain=1#L38

    const spanContext = trace.getSpanContext(context);
    // Skip if the current span context is not valid or suppressed.
    if (
      !spanContext ||
      !isSpanContextValid(spanContext) ||
      isTracingSuppressed(context)
    ) {
      // fall-back to parsing traceparent or XCTC headers
      return;
    }
    // use spanContext.traceId, spanContext.spanId and spanContext.traceFlags & TraceFlags.SAMPLED

dashpole avatar Feb 06 '24 17:02 dashpole

For java, it would look similar to this: https://github.com/open-telemetry/opentelemetry-java/blob/3e7302e742227d4526b765621a933d748012c906/sdk/metrics/src/main/java/io/opentelemetry/sdk/metrics/internal/exemplar/TraceBasedExemplarFilter.java#L35

    spanContext SpanContext = Span.fromContext(context).getSpanContext()
    // make sure it is valid, and then use spanContext's trace id, span id, and sampled flag

dashpole avatar Feb 06 '24 17:02 dashpole

Looks like in python it is something like this: https://github.com/open-telemetry/opentelemetry-python/blob/66e7d61ceba839c16d51b9297334ec8615cdf6a0/opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/init.py#L479

    span_context = get_current_span().get_span_context()
    // make sure it is valid?  Then use spanContext's  trace id, span id, and sampled flag

dashpole avatar Feb 06 '24 17:02 dashpole

Thanks @dashpole! There is another impacted scenario for App Engines, Cloud Functions and Cloud Runs with automatic tracing(https://cloud.google.com/trace/docs/overview#configurations_with_automatic_tracing). As a fix for these environments, do you think we should generate our own new Span Id instead of retrieving it form headers?

More generically, for customers using tracing via Tracing Client libraries instead of Open Telemetry, do you have any suggestions for Span Id processing?

cindy-peng avatar Feb 06 '24 17:02 cindy-peng

Parsing the XCTC or traceparent headers as a fallback if OpenTelemetry span context is not found (as the logging library in Go does today) will handle the automatic tracing from Cloud Run, app engine, etc. even for users not using OpenTelemetry, since those platforms set the XCTC header.

I would not recommend integrating with the non-OpenTelemetry client libraries unless there is a significant need. It should be possible to do something similar to what i'm suggesting for OTel (find the "get current span context" method in those libraries), but we expect to change the recommendation to use OTel for tracing in those languages very soon.

dashpole avatar Feb 06 '24 18:02 dashpole