dd-trace-py
dd-trace-py copied to clipboard
Allow the OpenTelemetry tracer to be used with the profiler
Which version of dd-trace-py are you using?
1.1.3
Which version of pip are you using?
22.0.3
Which version of the libraries are you using?
OpenTelemetry 1.12.0rc1
How can we reproduce your problem?
I use OpenTelemetry for all span/trace instrumentation. I would like to use the traces/spans from OpenTelemetry to give context to the ddtrace Profiler. Here's an example of what I would like to do:
from ddtrace.profiling import Profiler
from opentelemetry import trace
otel_tracer = trace.get_tracer(__name__)
Profiler(
env="prod", service="myservice", version="abc123", url=f"http://datadog-host.internal", tracer=otel_tracer
)
Alternatively, is there a bridge maybe for the default Datadog tracer from ddtrace.tracer to simply inherit the trace/span information from OpenTelemetry?
@phillipuniverse Did you find any way to workaround this? (we are looking to do exactly the same)
@shahargl not yet but I hope to make some investments in this area within the next month!
Potentially relevant is https://github.com/DataDog/dd-trace-py/pull/4170 where the DD tracer now has knowledge of the OpenTelemetry traceparent headers.
Looks like #4170 was reverted, base issue at #3872.
I went down a pretty long and involved process trying to build a bridge between the OpenTelemetry tracer and the Datadog tracer. I couldn't quite get the stack profiles to work though and I think in general it's just too shaky. I'm going to try a different approach where I do the following:
- Take some of the work in #4170 (that unfortunately got reverted) and use that to start a Datadog tracer via propagation alongside the otel tracer
- Disable any span/trace exports that the Datadog tracer might generate since they are already captured by OpenTelemetry
This has other downsides though. I would like to avoid the Datadog instrumentation to not worry about disabling exports of spans, but if I don't do instrumentation I will have to manually establish the tracer in Django/FastAPI/pika/Celery/etc. entrypoints in my services. I need to think more about this.
At any rate, here is a gist to my sort-of working code. Profiles do show up but are not linked to spans so it doesn't truly work right.
Below is a lot of research that I did into this which has been dead-ends so far. At this point I would need some additional advice from Datadog folks in what exactly needs to be set where in the internals of the Datadog tracer to get things synced correctly between DD and otel. I'm going to pause work on this until I get additional feedback.
Summary of the issues below:
- The profiler at some point does see trace id/span ids
- The profiler seems to create a bunch of extra stack trace events and keeps around the last span id for a period of time. The Datadog span is almost certainly not being "finished" correctly
- The bridge between the Otel span and the DD span is currently designed as "point in time" vs constant. Attributes need to be syncing constantly between the otel span and dd span to detect if the span is finished or not
I think probably the final conclusion of this is that the bridge will work like the existing opentracing implementattion. @Kyle-Verhoog it looks like you had were one of the initial drivers there, got some pointers for important pieces of that bridge that could apply to OpenTelemetry support? Specifically, what needs to be synced on the Datadog side?
A couple of other thoughts:
- Is the context api the best spot to monkeypatch from OpenTelemetry?
- OpenTelemetry has a nice SpanProcessor abstraction. This is mainly used by the exporters, but maybe this is the right place to "start" and "end" the Datadog spans?
I have an update but it's not really good news. I started going down the path of using the Datadog instrumentation and disabling export. I found a good workaround in this issue for how to disable the trace writer which works great!
But it's still not quite giving me what I want because I still have to sync the Datadog Span/Trace with the ids of the otel Span/Trace. Everything has to line up for profiler events to be associated in the APM. I couldn't find a great way to do that but maybe something like monkeypatching the Span constructor to hardcode the otel span/trace ids? But then you have another complication in ensuring that the Datadog tracer is always established after the Otel one. And, it wouldn't work in places where you might be creating a manual span via otel. I think the only viable solution then is to have the Datadog side hook into the otel side like I thought originally.
So back to my gist. I think I have a better bridge between otel and dd through the OpenTelemetry context api. That is a low-level function that is truly activates/de-activates the span. My goal with this is to take the otel context api and use that to sync in changes to the DD context api.
This works slightly better. In the gist I include a monkeypatch to the DD Recorder which is responsible for exporting profiles. I have some simple prints to figure out what the current trace and span id are. When I hit an endpoint in my Django app, I do see this printing out the span id. But you'll see that the _local_root is None, which based on my research below seems to be required. I am obviously not setting that in my bridge to convert an otel span into a Datadog span:
backend-core-core-1 | Recording event StackSampleEvent with span 14404735425225168374 and local root None
backend-core-core-1 | Recording event StackSampleEvent with span 14404735425225168374 and local root None
backend-core-core-1 | Recording event StackSampleEvent with span 14404735425225168374 and local root None
backend-core-core-1 | Recording event StackSampleEvent with span 14404735425225168374 and local root None
backend-core-core-1 | Recording event StackSampleEvent with span 14404735425225168374 and local root None
backend-core-core-1 | Recording event StackSampleEvent with span 3478091631850939025 and local root None
backend-core-core-1 | Recording event StackSampleEvent with span 14404735425225168374 and local root None
backend-core-core-1 | Recording event StackSampleEvent with span 3478091631850939025 and local root None
backend-core-core-1 | Recording event StackSampleEvent with span 14404735425225168374 and local root None
backend-core-core-1 | Recording event StackSampleEvent with span 14404735425225168374 and local root None
backend-core-core-1 | Recording event StackSampleEvent with span 14404735425225168374 and local root None
backend-core-core-1 | Recording event StackSampleEvent with span 14404735425225168374 and local root None
backend-core-core-1 | Recording event StackSampleEvent with span 3478091631850939025 and local root None
backend-core-core-1 | Recording event StackSampleEvent with span 14404735425225168374 and local root None
backend-core-core-1 | Recording event StackSampleEvent with span 3478091631850939025 and local root None
backend-core-core-1 | Recording event StackSampleEvent with span 14404735425225168374 and local root None
backend-core-core-1 | Recording event StackSampleEvent with span 3478091631850939025 and local root None
Another weird thing is that it is constantly ping-ponging between span ids well after the request is over, like it's continuing to get profile events for a previous span even after that span is done. My guess is that I need to be stopping/detaching something but I don't know where.
Then I went and tried to validate what happens without OpenTelemetry and just manually patch. So I included this:
config.env = ENVIRONMENT # the environment the application is in
config.service = SERVICE # name of your application
config.version = VERSION # version of your application
patch(django=True)
And voila, I got span ids and they were all hooked up into the Datadog APM correctly. I also only see this logging when I would expect, in the context of an API request, not persistently like before.
backend-core-core-1 | Recording event StackSampleEvent with span 15632444396742362186 and local root 7301043703004759137
backend-core-core-1 | Recording event StackSampleEvent with span 15632444396742362186 and local root 7301043703004759137
backend-core-core-1 | Recording event StackSampleEvent with span 15632444396742362186 and local root 7301043703004759137
backend-core-core-1 | Recording event StackSampleEvent with span 15632444396742362186 and local root 7301043703004759137
backend-core-core-1 | Recording event StackSampleEvent with span 15632444396742362186 and local root 7301043703004759137
backend-core-core-1 | Recording event StackSampleEvent with span 15632444396742362186 and local root 7301043703004759137
backend-core-core-1 | Recording event StackSampleEvent with span 15632444396742362186 and local root 7301043703004759137
backend-core-core-1 | Recording event StackSampleEvent with span 15632444396742362186 and local root 7301043703004759137
backend-core-core-1 | Recording event StackSampleEvent with span 15632444396742362186 and local root 7301043703004759137
backend-core-core-1 | Recording event StackSampleEvent with span 15632444396742362186 and local root 7301043703004759137
backend-core-core-1 | Recording event StackSampleEvent with span 15632444396742362186 and local root 7301043703004759137
@phillipuniverse sorry for the long delay responding here, would you be willing to send me an email at [email protected] I'd love to try and schedule a call with you to discuss this topic!
I'm fairly sure that this issue was fixed by https://github.com/DataDog/dd-trace-py/pull/5140, which was merged recently. @mabdinur please reopen this issue if I'm mistaken.
Whoops! The new OpenTelemetry integration does not yet link profiles with otel spans.
Manually doing what the Github Action was supposed to do (fixed in https://github.com/DataDog/dd-trace-py/pull/7835):
This issue has been automatically closed after six months of inactivity. If it's a feature request, it has been added to the maintainers' internal backlog and will be included in an upcoming round of feature prioritization. Please comment or reopen if you think this issue was closed in error.