google-cloud-go
google-cloud-go copied to clipboard
logging: OpenTelemetry spans are ignored as parents for LogRecords
Client
logging: https://github.com/googleapis/google-cloud-go/tree/main/logging
Current behavior
The current implementation of populateTraceInfo always uses a remote span context as the parent https://github.com/googleapis/google-cloud-go/blob/ac16144215f844294c29063776f2f116e14ecebb/logging/logging.go#L807-L838
By "Remote span context", I mean a trace ID and span ID that originate from outside the application. The code above always uses a remote span context because it explicitly parses the trace id and span id from HTTP headers.
When a user has instrumented their application with OpenTelemetry, the trace ID and span ID have already been parsed from the HTTP headers, and used to start a span. If they are using the otelhttp library, it:
- extracts the context from req.Header into a golang context.Context
- uses the context to start a span
- updates the context.Context on the req using req.WithContext
Expected behavior
If the user is using opentelemetry instrumentation, we want to use the traceID and spanID of the span that OpenTelemetry created. This way, when a user looks for logs associated with that span, the logs produced by this logging library will appear. The traceID and spanID are easy to get since otelhttp stores them in req.Context()
.
We can do something close to this:
func populateTraceInfo(e *Entry, req *http.Request) bool {
if req == nil {
if e.HTTPRequest != nil && e.HTTPRequest.Request != nil {
req = e.HTTPRequest.Request
} else {
return false
}
}
// New handing code
otelSpanContext := trace.SpanContextFromContext(req.Context())
if otelSpanContext.IsValid() {
e.Trace := otelSpanContext.TraceID()
e.SpanID := otelSpanContext.SpanID()
e.TraceSampled = e.TraceSampled | otelSpanContext.IsSampled()
return true
}
// end of new code
header := req.Header.Get("Traceparent")
if header != "" {
// do not use traceSampled flag defined by traceparent because
// flag's definition differs from expected by Cloud Tracing
traceID, spanID, _ := deconstructTraceParent(header)
if traceID != "" {
e.Trace = traceID
e.SpanID = spanID
return true
}
}
header = req.Header.Get("X-Cloud-Trace-Context")
if header != "" {
traceID, spanID, traceSampled := deconstructXCloudTraceContext(header)
if traceID != "" {
e.Trace = traceID
e.SpanID = spanID
// enforce sampling if required
e.TraceSampled = e.TraceSampled || traceSampled
return true
}
}
return false
}
To test this, we should make sure an http request with a context from a span creation ends up in the LogEntry:
req := &http.Request{}
ctx := context.Background()
ctx, _ = trace.NewNoopTracerProvider().Tracer("test instrumentation").Start(ctx, "span name")
req = req.WithContext(ctx)
e := &Entry{}
got := populateTraceInfo(e, req)
assert.True(t, got)
assert.Len(t, e.Trace, 32)
assert.Len(t, e.SpanID, 16)
// TODO test sampled somehow
Sorry for the label churn, I misunderstood where this function was.
@dashpole how can we get this to work both in the case where the user is using otel libraries and the case where they're not using otel libraries?
I'm suggesting you add a case that handles otel libraries in addition to the current parsing for traceparent and XCTC. If they aren't using OTel libraries, it should fall back to the existing logic.
Edit: I've updated the sample code above to show the full function, with the addition of OTel-handling logic.
Ahh, ok. Are there any other libraries customers are using besides this one that we might have to support in the future?
There are some projects that predated OpenTelemetry, such as OpenCensus or OpenTracing that had similar functionality, but both projects are archived in favor of OpenTelemetry. For Go, OpenTelemetry is really the only in-process tracing instrumentation that has broad adoption.
Hi @dashpole , do you know if this logic works the same for other language version of otelhttp?
When a user has instrumented their application with OpenTelemetry, the trace ID and span ID have already been parsed from the HTTP headers, and used to start a span. If they are using the otelhttp library, it:
extracts the context from req.Header into a golang context.Context uses the context to start a span updates the context.Context on the req using req.WithContext
Can we assume the suggested fix will be similar for nodejs, java and python?
Yes, although context implementations differ between languages. @aabmass might be able to help with the other languages if you have questions.
It sounds like there is some unexpected behavior re:tracing contexts used so I added the type: bug
label as part of issue triage, feel free to change that, just trying to organize things.
Hi @aabmass , I am currently working on similar fixes for this issue in nodejs-logging and java-logging repos. I am looking for similar approaches as following to to get SpanContext in node and java to propagate the correct traceId/span Ids:
// New handing code
otelSpanContext := trace.SpanContextFromContext(req.Context())
if otelSpanContext.IsValid() {
e.Trace := otelSpanContext.TraceID()
e.SpanID := otelSpanContext.SpanID()
e.TraceSampled = e.TraceSampled | otelSpanContext.IsSampled()
return true
}
// end of new code
Seems like there are quite a few OTEL instrumentation libraries. Could you point me to the right OTEL nodejs and java libraries/dependencies that I can use to retrieve the span context?
Here are where the traceId handled in logging repos currently:
- nodejs-logging: https://github.com/googleapis/nodejs-logging/blob/main/src/entry.ts#L235
- Java-logging: https://github.com/googleapis/java-logging/blob/main/google-cloud-logging/src/main/java/com/google/cloud/logging/LoggingImpl.java#L841
@aabmass For Python integrations like Django/Flask, should we use the otel library for Python, or try to integrate the otel Django/Flask integrations with our logging library?
@cindy-peng @gkevinzheng You should just need to use the trace API library for each language, rather than a specific instrumentation library. I was just trying to illustrate how they would work together, but the implementation I wrote above for Go will work no matter what library they are using.
I think for JS, you can get the current span context like this: https://github.com/open-telemetry/opentelemetry-js/blob/f86251d40fbf615be87319c8a1f5643afb820076/doc/propagation.md?plain=1#L38
const spanContext = trace.getSpanContext(context);
// Skip if the current span context is not valid or suppressed.
if (
!spanContext ||
!isSpanContextValid(spanContext) ||
isTracingSuppressed(context)
) {
// fall-back to parsing traceparent or XCTC headers
return;
}
// use spanContext.traceId, spanContext.spanId and spanContext.traceFlags & TraceFlags.SAMPLED
For java, it would look similar to this: https://github.com/open-telemetry/opentelemetry-java/blob/3e7302e742227d4526b765621a933d748012c906/sdk/metrics/src/main/java/io/opentelemetry/sdk/metrics/internal/exemplar/TraceBasedExemplarFilter.java#L35
spanContext SpanContext = Span.fromContext(context).getSpanContext()
// make sure it is valid, and then use spanContext's trace id, span id, and sampled flag
Looks like in python it is something like this: https://github.com/open-telemetry/opentelemetry-python/blob/66e7d61ceba839c16d51b9297334ec8615cdf6a0/opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/init.py#L479
span_context = get_current_span().get_span_context()
// make sure it is valid? Then use spanContext's trace id, span id, and sampled flag
Thanks @dashpole! There is another impacted scenario for App Engines, Cloud Functions and Cloud Runs with automatic tracing(https://cloud.google.com/trace/docs/overview#configurations_with_automatic_tracing). As a fix for these environments, do you think we should generate our own new Span Id instead of retrieving it form headers?
More generically, for customers using tracing via Tracing Client libraries instead of Open Telemetry, do you have any suggestions for Span Id processing?
Parsing the XCTC or traceparent headers as a fallback if OpenTelemetry span context is not found (as the logging library in Go does today) will handle the automatic tracing from Cloud Run, app engine, etc. even for users not using OpenTelemetry, since those platforms set the XCTC header.
I would not recommend integrating with the non-OpenTelemetry client libraries unless there is a significant need. It should be possible to do something similar to what i'm suggesting for OTel (find the "get current span context" method in those libraries), but we expect to change the recommendation to use OTel for tracing in those languages very soon.