dd-trace-py icon indicating copy to clipboard operation
dd-trace-py copied to clipboard

perf: reduce Span._finish_ns overhead in tracer and profiling

Open brettlangdon opened this issue 5 months ago • 3 comments

  • Remove Hooks class usage from BaseContextProvider and Tracer
  • Replace hook-based callbacks with core event dispatcher system
  • Optimize _update_active logic to reduce redundant finished() checks
  • Cache span._local_root property lookup in profiling link_span
  • Streamline span activation flow in DefaultContextProvider
  • Convert SimpleMovingAverage to Cython for C-speed mathematical operations
  • Add encoder caching to reduce redundant _trace_id_64bits property lookups

Testing against a locally running version of PyPA warehouse with a single gunicorn worker with tracing and profiling enabled we saw an improvement of:

  • a 4.9% improvement in requests per second (20.08 → 21.06 RPS)
  • 4.6% reduction in response time (49.79ms → 47.48ms)
  • elimination of 44,028 function calls (-7.2%)

Checklist

  • [ ] PR author has checked that all the criteria below are met
  • The PR description includes an overview of the change
  • The PR description articulates the motivation for the change
  • The change includes tests OR the PR description describes a testing strategy
  • The PR description notes risks associated with the change, if any
  • Newly-added code is easy to change
  • The change follows the library release note guidelines
  • The change includes or references documentation updates if necessary
  • Backport labels are set (if applicable)

Reviewer Checklist

  • [ ] Reviewer has checked that all the criteria below are met
  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Newly-added code is easy to change
  • Release note makes sense to a user of the library
  • If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

brettlangdon avatar Jun 19 '25 00:06 brettlangdon

CODEOWNERS have been resolved as:

ddtrace/internal/sma.pyi                                                @DataDog/apm-core-python
ddtrace/internal/sma.pyx                                                @DataDog/apm-core-python
src/native/event_hub.rs                                                 @DataDog/apm-core-python
.gitignore                                                              @DataDog/apm-core-python
ddtrace/_trace/provider.py                                              @DataDog/apm-sdk-api-python
ddtrace/_trace/tracer.py                                                @DataDog/apm-sdk-api-python
ddtrace/internal/_encoding.pyx                                          @DataDog/apm-core-python
ddtrace/internal/core/__init__.py                                       @DataDog/apm-core-python
ddtrace/internal/core/event_hub.py                                      @DataDog/apm-core-python
ddtrace/internal/datadog/profiling/stack_v2/__init__.py                 @DataDog/profiling-python
ddtrace/internal/native/_native.pyi                                     @DataDog/apm-core-python
ddtrace/internal/writer/writer.py                                       @DataDog/apm-core-python
ddtrace/profiling/collector/stack.pyx                                   @DataDog/profiling-python
pyproject.toml                                                          @DataDog/python-guild
setup.py                                                                @DataDog/python-guild
src/native/lib.rs                                                       @DataDog/apm-core-python
tests/suitespec.yml                                                     @DataDog/python-guild @DataDog/apm-core-python

github-actions[bot] avatar Jun 19 '25 00:06 github-actions[bot]

Bootstrap import analysis

Comparison of import times between this PR and base.

Summary

The average import time from this PR is: 273 ± 3 ms.

The average import time from base is: 275 ± 2 ms.

The import time difference between this PR and base is: -2.5 ± 0.1 ms.

Import time breakdown

The following import paths have grown:

ddtrace.auto 2.501 ms (0.92%)
ddtrace.bootstrap.sitecustomize 1.891 ms (0.69%)
ddtrace.bootstrap.preload 1.177 ms (0.43%)
ddtrace.internal.products 0.947 ms (0.35%)
importlib.metadata 0.947 ms (0.35%)
importlib.metadata._meta 0.660 ms (0.24%)
importlib.abc 0.083 ms (0.03%)
importlib.resources 0.083 ms (0.03%)
importlib.resources._common 0.083 ms (0.03%)
zipfile 0.072 ms (0.03%)
ddtrace.settings.profiling 0.142 ms (0.05%)
ddtrace.vendor.psutil 0.073 ms (0.03%)
ddtrace.vendor.psutil._pslinux 0.073 ms (0.03%)
ddtrace.vendor.psutil._psposix 0.073 ms (0.03%)
ddtrace.internal.datadog.profiling.stack_v2 0.070 ms (0.03%)
ddtrace.internal.datadog.profiling.stack_v2._stack_v2 0.070 ms (0.03%)
ddtrace.internal.runtime.runtime_metrics 0.088 ms (0.03%)
ddtrace.internal.runtime.metric_collectors 0.088 ms (0.03%)
ddtrace.appsec._common_module_patches 0.663 ms (0.24%)
ddtrace.appsec._asm_request_context 0.663 ms (0.24%)
ddtrace._trace.trace_handlers 0.051 ms (0.02%)
ddtrace 0.610 ms (0.22%)
ddtrace.trace 0.552 ms (0.20%)
ddtrace._trace.filters 0.445 ms (0.16%)
ddtrace._trace.processor 0.445 ms (0.16%)
ddtrace.internal.writer 0.138 ms (0.05%)
ddtrace.internal.writer.writer 0.138 ms (0.05%)
gzip 0.081 ms (0.03%)
ddtrace.internal.sma 0.057 ms (0.02%)
ddtrace.internal.dogstatsd 0.104 ms (0.04%)
ddtrace.vendor.dogstatsd 0.104 ms (0.04%)
ddtrace.vendor.dogstatsd.base 0.104 ms (0.04%)
queue 0.104 ms (0.04%)
ddtrace._trace.sampler 0.088 ms (0.03%)
ddtrace._trace.span 0.088 ms (0.03%)
ddtrace.internal.sampling 0.088 ms (0.03%)
ddtrace._trace.sampling_rule 0.088 ms (0.03%)
ddtrace._trace.tracer 0.106 ms (0.04%)
ddtrace.internal.debug 0.106 ms (0.04%)
ddtrace._monkey 0.058 ms (0.02%)
ddtrace.settings.asm 0.058 ms (0.02%)
ddtrace.appsec._constants 0.058 ms (0.02%)

The following import paths have shrunk:

ddtrace.auto 5.290 ms (1.94%)
ddtrace.bootstrap.sitecustomize 2.993 ms (1.10%)
ddtrace.bootstrap.preload 2.330 ms (0.85%)
ddtrace.internal.products 0.879 ms (0.32%)
importlib.metadata 0.879 ms (0.32%)
zipfile 0.632 ms (0.23%)
zipfile._path 0.632 ms (0.23%)
csv 0.104 ms (0.04%)
_csv 0.104 ms (0.04%)
importlib.abc 0.073 ms (0.03%)
importlib.resources 0.073 ms (0.03%)
importlib.resources._legacy 0.073 ms (0.03%)
importlib.metadata._itertools 0.070 ms (0.03%)
ddtrace.internal.remoteconfig.client 0.619 ms (0.23%)
ddtrace.internal.runtime.runtime_metrics 0.089 ms (0.03%)
ddtrace.internal.runtime.tag_collectors 0.089 ms (0.03%)
ddtrace.settings.profiling 0.054 ms (0.02%)
ddtrace.internal.datadog.profiling.ddup 0.054 ms (0.02%)
ddtrace.internal.datadog.profiling.ddup._ddup 0.054 ms (0.02%)
multiprocessing.sharedctypes 0.043 ms (0.02%)
multiprocessing.heap 0.043 ms (0.02%)
mmap 0.043 ms (0.02%)
ddtrace.appsec._common_module_patches 0.663 ms (0.24%)
ddtrace.appsec._asm_request_context 0.663 ms (0.24%)
ddtrace.appsec._utils 0.663 ms (0.24%)
ddtrace 2.298 ms (0.84%)
ddtrace._monkey 1.079 ms (0.40%)
ddtrace.appsec._listeners 0.965 ms (0.35%)
ddtrace.internal.core 0.965 ms (0.35%)
ddtrace.internal.core.event_hub 0.880 ms (0.32%)
ddtrace.vendor.packaging.specifiers 0.114 ms (0.04%)
ddtrace.trace 0.518 ms (0.19%)
ddtrace._trace.filters 0.382 ms (0.14%)
ddtrace._trace.processor 0.382 ms (0.14%)
ddtrace._trace.sampler 0.204 ms (0.07%)
ddtrace._trace.span 0.204 ms (0.07%)
pprint 0.121 ms (0.04%)
ddtrace.internal.dogstatsd 0.100 ms (0.04%)
ddtrace.vendor.dogstatsd 0.100 ms (0.04%)
ddtrace.vendor.dogstatsd.base 0.100 ms (0.04%)
ddtrace.vendor.dogstatsd.container 0.100 ms (0.04%)
ddtrace.internal.writer 0.078 ms (0.03%)
ddtrace.internal.writer.writer 0.078 ms (0.03%)
ddtrace._trace.tracer 0.099 ms (0.04%)
ddtrace.settings.peer_service 0.099 ms (0.04%)
ddtrace._trace.provider 0.037 ms (0.01%)
ddtrace.internal._unpatched 0.056 ms (0.02%)
json 0.030 ms (0.01%)
json.decoder 0.030 ms (0.01%)
re 0.030 ms (0.01%)
enum 0.030 ms (0.01%)
types 0.030 ms (0.01%)
subprocess 0.026 ms (0.01%)
contextlib 0.026 ms (0.01%)

github-actions[bot] avatar Jun 19 '25 00:06 github-actions[bot]

Benchmarks

Benchmark execution time: 2025-06-20 20:32:31

Comparing candidate commit 50dc57cef55a70e987a24c01569aee999a9227e9 in PR branch brettlangdon/perf.optimize.span.finish with baseline commit 7a556744cd4dfc85a0284f486c73212bf7c633cb in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 561 metrics, 3 unstable metrics.

pr-commenter[bot] avatar Jun 19 '25 01:06 pr-commenter[bot]

addressed these in individual PRs

brettlangdon avatar Jul 18 '25 18:07 brettlangdon