Automatic instrumentation for short-running applications

Open majorgreys opened this issue 3 years ago • 1 comments

If a user wants to create a trace for a short-running Python application, they are required to add manual instrumentation. Specifically, a trace must be created and finished during the application lifecycle.

One solution is to add automatic instrumentation of the sort proposed in #4298 where a trace is started in the library's sitecustomize.py and finished during atexit.

The Datadog APM PHP library provides a similar feature: https://docs.datadoghq.com/tracing/guide/trace-php-cli-scripts/#short-running-cli-scripts

This issue is intended to track discussion on adding similar functionality to the Python library.

Oct 13 '22 17:10 majorgreys

Hi @majorgreys, thanks for creating the ticket. I created the original PR since I wanted this feature for our kubernetes python background job setup. I'd also be willing to make this code "merge-ready" this since I use my fork now in my system and no longer benefit from any updates other people do to dd-trace-py (unless I keep it manually up to date which is annoying).

Our concrete use case:

I have a service that starts plain python background kubernetes jobs (but they could also run in any other environment) and these run end to end importing some data. Sometimes calls fail and I would just like to take advantage of datadogs tracing APM to cover this getting an overview of how many jobs fail end when and why. These jobs could also be triggered from a service request which is why I also added the SPAN_ID/TRACE_ID into the environment variables so that the parent is correctly attached.

The proposal is:

Add environment variables.

DD_START_TRACE_ON_STARTUP: Starts a trace on Application startup before anything else starts, and ends it when the application is done (I used the existing atexit handler for this in the PR).
DD_TRACE_ID and DD_SPAN_ID: To setup the correct trace context for the new trace (could also use another serialization mechanism if there are any strong feelings about this.
DD_TRACE_NAME: If provided this is just a simple way to name the trace start from the outside.

Alternatives that I considered but don't want to do:

Touch every jobs code directly and install ddtrace and manually start a trace. Some of the code is hosted on other open source repositories and I have over 100 different jobs that load data so the generic ddtrace-run makes this much easier to "plug and play"

Also this feature is also relevant for me: https://github.com/DataDog/dd-trace-py/issues/2065 just want to bring some attention to it since I've also changed code to accomplish this :)

Oct 14 '22 16:10 jascha-beste