dd-trace-go icon indicating copy to clipboard operation
dd-trace-go copied to clipboard

profiler: enable endpoint call counts by default

Open felixge opened this issue 1 year ago • 6 comments

DO NOT MERGE YET

What does this PR do?

Enables https://github.com/DataDog/dd-trace-go/pull/1552 by default for profiling+tracing users. This adds a small critical section updating counters in a map in the span creation hot path. But the impact of this is not measurable in our span creation benchmarks (both concurrent and single goroutine flavors) as demonstrated by this PR.

The absolute worst-case estimate I have is an additional latency of ~150ns/span under a maximum contention that is probably 10-100x worse than anything that can be achieved in the real world. Without contention this feature should add ~20ns latency per span (~1%). See this gist for more details.

Motivation

Provides a way to measure the CPU Time per Request which is very useful for evaluating the impact of profile guided optimization.

Reviewer's Checklist

  • [ ] Changed code has unit tests for its functionality at or near 100% coverage.
  • [ ] System-Tests covering this feature have been added and enabled with the va.b.c-dev version tag.
  • [ ] There is a benchmark for any new code, or changes to existing code.
  • [ ] If this interacts with the agent in a new way, a system test has been added.
  • [ ] Add an appropriate team label so this PR gets put in the right place for the release notes.
  • [ ] Non-trivial go.mod changes, e.g. adding new modules, are reviewed by @DataDog/dd-trace-go-guild.

For Datadog employees:

  • [ ] If this PR touches code that handles credentials of any kind, such as Datadog API keys, I've requested a review from @DataDog/security-design-and-guidance.
  • [ ] This PR doesn't touch any of that.

Unsure? Have a question? Request a review!

felixge avatar Mar 01 '24 06:03 felixge

Benchmarks

Benchmark execution time: 2024-03-01 07:39:45

Comparing candidate commit 4233c7c408fbcdda3b68129ecc0b26f75547ffbb in PR branch felix.geisendoerfer/PROF-8816-enable-unit-of-work-by-default with baseline commit 5762cf1f61ea0db272d0f1e9ea957f8827c85b65 in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 42 metrics, 2 unstable metrics.

pr-commenter[bot] avatar Mar 01 '24 07:03 pr-commenter[bot]

Note: PR https://github.com/DataDog/dd-trace-go/pull/1845 disabled BenchmarkConcurrentTracing due to reliability issues. I think that has created a gap in our ability to detect regressions related to contention problems. Going forward, PRs disabling benchmarks due to reliability issues should provide some breadcrumbs to understand what the problems were.

felixge avatar Mar 01 '24 07:03 felixge

Note: I checked the output of the benchmark pipeline and verified that BenchmarkConcurrentTracing was tested and also downloaded the results to sanity check the conclusion: There is no measurable difference here.

image

felixge avatar Mar 01 '24 08:03 felixge

This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] avatar Mar 22 '24 01:03 github-actions[bot]

This PR was closed because it has been open for 30 days with no activity.

github-actions[bot] avatar Apr 21 '24 01:04 github-actions[bot]

This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] avatar May 12 '24 01:05 github-actions[bot]

This PR was closed because it has been open for 30 days with no activity.

github-actions[bot] avatar Jun 11 '24 01:06 github-actions[bot]