tempo
tempo copied to clipboard
Trace propagation with `Traceparent` and otel tracer doesn't work
Describe the bug
When using the experimental -use-otel-trace=true flag, traces aren't propagated correctly, even when the client sends the Traceparent header in the request. I believe the problem comes from the ExtractTraceID from the weaveworks common lib, which looks for a Jaeger SpanContext in the request Context, but I'm not 100% sure.
To Reproduce Steps to reproduce the behavior:
- Start Tempo v1.3.0
- Send a request to any API (I used /api/search and /api/traces/{traceID} when testing) along with a valid
Traceparentheader. Make sure your client script is exporting traces to Tempo as well. - Grab the trace ID sent by the client and look for it in Tempo.
Expected behavior
I would expect that enabling the OpenTelemetry tracer would automatically enable trace propagation following the Trace Context specification.
Any work ongoing related to this ?
None that I'm aware of. We continue to use the Jaeger clients internally.
These, however, have been deprecated and we intend to move to OTel clients at some point.
Is this still unresolved or does a workaround exist?
I have a client app sending traces to Grafana cloud and making http calls to a server api, which is also sending its own traces to the same Grafana cloud tempo trace data source. It would be great to show the server spans inside the caller traceparent! Isn't there any way to achieve it with opentelemetry sdk and Grafana cloud Tempo traces? Honestly I'm surprised that those traces are not linked or linkable, considering that they are just in the same data source (while distributed tracing is aiming to propagate even through different tracing systems)
Yes, this is working!
In particular you can adapt it to detached span too considring this.
client side, where the first root span is started and must be the parent
>>> a=tracer.start_span('first')
>>> a.context
SpanContext(trace_id=0x94f61d7bd16071fb6d826ccfefec5c83, span_id=0x6947945713d7c017, trace_flags=0x01, trace_state=[], is_remote=False)
>>> from opentelemetry.trace.propagation import set_span_in_context
>>> ctx = set_span_in_context(a)
>>> ctx
{'current-span-6e6e35f7-bc0b-4ec9-84f0-7af5b96ec769': _Span(name="first", context=SpanContext(trace_id=0x94f61d7bd16071fb6d826ccfefec5c83, span_id=0x6947945713d7c017, trace_flags=0x01, trace_state=[], is_remote=False))}
>>> prop.inject(carrier=carrier,context=ctx)
>>> carrier
{'traceparent': '00-94f61d7bd16071fb6d826ccfefec5c83-6947945713d7c017-01'}
server side, where the second root span is started and must be the child
>>> ctx = prop.extract(carrier=carrier)
>>> ctx
{'current-span-6e6e35f7-bc0b-4ec9-84f0-7af5b96ec769': NonRecordingSpan(SpanContext(trace_id=0x94f61d7bd16071fb6d826ccfefec5c83, span_id=0x6947945713d7c017, trace_flags=0x01, trace_state=[], is_remote=True))}
>>> b=tracer.start_span('second',context=ctx)
>>> b.end()
>>> {
"name": "second",
"context": {
"trace_id": "0x94f61d7bd16071fb6d826ccfefec5c83",
"span_id": "0x45f7b33030725702",
"trace_state": "[]"
},
"kind": "SpanKind.INTERNAL",
"parent_id": "0x6947945713d7c017",
"start_time": "2022-08-26T03:13:18.598479Z",
"end_time": "2022-08-26T03:13:23.023070Z",
"status": {
"status_code": "UNSET"
},
"attributes": {},
"events": [],
"links": [],
"resource": {
"service.name": "service-pyshell"
}
}
the magic begins: from the above you get a <root span not yet received> in grafana cloud trace search!
>>> a.end()
>>> {
"name": "first",
"context": {
"trace_id": "0x94f61d7bd16071fb6d826ccfefec5c83",
"span_id": "0x6947945713d7c017",
"trace_state": "[]"
},
"kind": "SpanKind.INTERNAL",
"parent_id": null,
"start_time": "2022-08-26T03:11:19.586685Z",
"end_time": "2022-08-26T03:13:54.359962Z",
"status": {
"status_code": "UNSET"
},
"attributes": {},
"events": [],
"links": [],
"resource": {
"service.name": "service-pyshell"
}
}
the magic is done! you'll find that second is child of first in grafana cloud trace view!
Nice work digging into this! I believe this issue is about Tempo itself propagating traces when using the otel tracer. OTel tracing ingested by Tempo and produced by other applications should be fine.
OTel tracing ingested by Tempo and produced by other applications should be fine.
@joe-elliott yes, you're right, but that was not completely obvious and trivial to me, so I wanted to understand it well by implementing the whole thing myself. Sorry for being a little off topic here, but yes, in the end it is a confirmation that your above quoted sentence is true, indeed. 😉
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity. Please apply keepalive label to exempt this Issue.