dd-trace-dotnet icon indicating copy to clipboard operation
dd-trace-dotnet copied to clipboard

[Profiler] Cache Walltime callstacks based on `ucontext_t`

Open gleocadie opened this issue 1 year ago • 4 comments

Summary of changes

Reason for change

Implementation details

Test coverage

Other details

gleocadie avatar Sep 03 '24 13:09 gleocadie

Execution-Time Benchmarks Report :stopwatch:

Execution-time results for samples comparing the following branches/commits:

Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:

  • Welch test with statistical test for significance of 5%
  • Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5975) - mean (70ms)  : 67, 73
     .   : milestone, 70,
    master - mean (70ms)  : 67, 73
     .   : milestone, 70,

    section CallTarget+Inlining+NGEN
    This PR (5975) - mean (1,089ms)  : 1065, 1113
     .   : milestone, 1089,
    master - mean (1,083ms)  : 1050, 1116
     .   : milestone, 1083,

gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5975) - mean (109ms)  : 105, 113
     .   : milestone, 109,
    master - mean (109ms)  : 105, 112
     .   : milestone, 109,

    section CallTarget+Inlining+NGEN
    This PR (5975) - mean (762ms)  : 741, 783
     .   : milestone, 762,
    master - mean (760ms)  : 736, 783
     .   : milestone, 760,

gantt
    title Execution time (ms) FakeDbCommand (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5975) - mean (93ms)  : 90, 96
     .   : milestone, 93,
    master - mean (93ms)  : 89, 96
     .   : milestone, 93,

    section CallTarget+Inlining+NGEN
    This PR (5975) - mean (708ms)  : 692, 723
     .   : milestone, 708,
    master - mean (712ms)  : 691, 734
     .   : milestone, 712,

gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5975) - mean (190ms)  : 188, 193
     .   : milestone, 190,
    master - mean (191ms)  : 187, 194
     .   : milestone, 191,

    section CallTarget+Inlining+NGEN
    This PR (5975) - mean (1,164ms)  : 1140, 1189
     .   : milestone, 1164,
    master - mean (1,160ms)  : 1138, 1181
     .   : milestone, 1160,

gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5975) - mean (275ms)  : 269, 280
     .   : milestone, 275,
    master - mean (276ms)  : 272, 280
     .   : milestone, 276,

    section CallTarget+Inlining+NGEN
    This PR (5975) - mean (924ms)  : 900, 948
     .   : milestone, 924,
    master - mean (924ms)  : 900, 947
     .   : milestone, 924,

gantt
    title Execution time (ms) HttpMessageHandler (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5975) - mean (265ms)  : 260, 269
     .   : milestone, 265,
    master - mean (265ms)  : 261, 268
     .   : milestone, 265,

    section CallTarget+Inlining+NGEN
    This PR (5975) - mean (910ms)  : 884, 935
     .   : milestone, 910,
    master - mean (906ms)  : 885, 928
     .   : milestone, 906,

andrewlock avatar Sep 03 '24 13:09 andrewlock

Datadog Report

Branch report: gleocadie/reuse-callstack-when-uniwinding-is-useless Commit report: d5ed294 Test service: dd-trace-dotnet

:white_check_mark: 0 Failed, 364531 Passed, 2333 Skipped, 16h 37m 29.63s Total Time

datadog-ddstaging[bot] avatar Sep 03 '24 13:09 datadog-ddstaging[bot]

Benchmarks Report for tracer :snail:

Benchmarks for #5975 compared to master:

  • 2 benchmarks are faster, with geometric mean 1.141
  • All benchmarks have the same allocations

The following thresholds were used for comparing the benchmark speeds:

  • Mann–Whitney U test with statistical test for significance of 5%
  • Only results indicating a difference greater than 10% and 0.3 ns are considered.

Allocation changes below 0.5% are ignored.

Benchmark details

Benchmarks.Trace.ActivityBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master StartStopWithChild net6.0 7.67μs 42.7ns 290ns 0.0179 0.00716 0 5.43 KB
master StartStopWithChild netcoreapp3.1 9.96μs 56.2ns 394ns 0.0195 0.00973 0 5.62 KB
master StartStopWithChild net472 16.2μs 33.9ns 127ns 1.01 0.295 0.0957 6.07 KB
#5975 StartStopWithChild net6.0 7.83μs 42.8ns 260ns 0.0157 0.00784 0 5.43 KB
#5975 StartStopWithChild netcoreapp3.1 10.2μs 56.2ns 351ns 0.0254 0.0152 0 5.62 KB
#5975 StartStopWithChild net472 15.9μs 63.2ns 245ns 1 0.287 0.0955 6.06 KB
Benchmarks.Trace.AgentWriterBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master WriteAndFlushEnrichedTraces net6.0 462μs 314ns 1.13μs 0 0 0 2.7 KB
master WriteAndFlushEnrichedTraces netcoreapp3.1 631μs 601ns 2.25μs 0 0 0 2.7 KB
master WriteAndFlushEnrichedTraces net472 831μs 359ns 1.39μs 0.414 0 0 3.3 KB
#5975 WriteAndFlushEnrichedTraces net6.0 477μs 448ns 1.74μs 0 0 0 2.7 KB
#5975 WriteAndFlushEnrichedTraces netcoreapp3.1 654μs 349ns 1.26μs 0 0 0 2.7 KB
#5975 WriteAndFlushEnrichedTraces net472 858μs 533ns 1.85μs 0.414 0 0 3.3 KB
Benchmarks.Trace.AspNetCoreBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendRequest net6.0 204μs 1.18μs 9.92μs 0.208 0 0 18.45 KB
master SendRequest netcoreapp3.1 228μs 1.32μs 13μs 0.215 0 0 20.61 KB
master SendRequest net472 0.00158ns 0.000714ns 0.00267ns 0 0 0 0 b
#5975 SendRequest net6.0 204μs 1.2μs 11.5μs 0.191 0 0 18.45 KB
#5975 SendRequest netcoreapp3.1 228μs 1.34μs 12.9μs 0.213 0 0 20.61 KB
#5975 SendRequest net472 0.00203ns 0.000667ns 0.00258ns 0 0 0 0 b
Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master WriteAndFlushEnrichedTraces net6.0 572μs 3.18μs 21.1μs 0.568 0 0 41.46 KB
master WriteAndFlushEnrichedTraces netcoreapp3.1 680μs 3.72μs 21.7μs 0.342 0 0 41.6 KB
master WriteAndFlushEnrichedTraces net472 847μs 4.2μs 18.8μs 8.33 2.5 0.417 53.32 KB
#5975 WriteAndFlushEnrichedTraces net6.0 565μs 2.95μs 13.8μs 0.584 0 0 41.63 KB
#5975 WriteAndFlushEnrichedTraces netcoreapp3.1 715μs 3.91μs 22.1μs 0.372 0 0 41.74 KB
#5975 WriteAndFlushEnrichedTraces net472 872μs 3.79μs 14.7μs 8.08 2.55 0.425 53.27 KB
Benchmarks.Trace.DbCommandBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master ExecuteNonQuery net6.0 1.31μs 1.26ns 4.87ns 0.0144 0 0 1.02 KB
master ExecuteNonQuery netcoreapp3.1 1.81μs 1.31ns 5.09ns 0.0135 0 0 1.02 KB
master ExecuteNonQuery net472 2.05μs 1.85ns 6.66ns 0.157 0 0 987 B
#5975 ExecuteNonQuery net6.0 1.31μs 1.45ns 5.61ns 0.0143 0 0 1.02 KB
#5975 ExecuteNonQuery netcoreapp3.1 1.76μs 1.93ns 6.94ns 0.0137 0 0 1.02 KB
#5975 ExecuteNonQuery net472 2.1μs 1.87ns 7ns 0.157 0 0 987 B
Benchmarks.Trace.ElasticsearchBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master CallElasticsearch net6.0 1.2μs 0.83ns 3.22ns 0.0138 0 0 976 B
master CallElasticsearch netcoreapp3.1 1.58μs 0.614ns 2.13ns 0.0133 0 0 976 B
master CallElasticsearch net472 2.4μs 0.884ns 3.19ns 0.158 0 0 995 B
master CallElasticsearchAsync net6.0 1.22μs 1ns 3.74ns 0.0134 0 0 952 B
master CallElasticsearchAsync netcoreapp3.1 1.61μs 1.19ns 4.6ns 0.0137 0 0 1.02 KB
master CallElasticsearchAsync net472 2.57μs 0.994ns 3.85ns 0.167 0 0 1.05 KB
#5975 CallElasticsearch net6.0 1.12μs 0.598ns 2.24ns 0.0137 0 0 976 B
#5975 CallElasticsearch netcoreapp3.1 1.54μs 0.528ns 1.98ns 0.0131 0 0 976 B
#5975 CallElasticsearch net472 2.39μs 1.65ns 6.39ns 0.158 0 0 995 B
#5975 CallElasticsearchAsync net6.0 1.34μs 1.47ns 5.7ns 0.0133 0 0 952 B
#5975 CallElasticsearchAsync netcoreapp3.1 1.59μs 0.617ns 2.31ns 0.0135 0 0 1.02 KB
#5975 CallElasticsearchAsync net472 2.52μs 1.67ns 6.47ns 0.166 0 0 1.05 KB
Benchmarks.Trace.GraphQLBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master ExecuteAsync net6.0 1.24μs 0.342ns 1.23ns 0.0131 0 0 952 B
master ExecuteAsync netcoreapp3.1 1.67μs 0.738ns 2.86ns 0.0125 0 0 952 B
master ExecuteAsync net472 1.83μs 0.815ns 3.15ns 0.145 0 0 915 B
#5975 ExecuteAsync net6.0 1.33μs 1.13ns 4.23ns 0.0133 0 0 952 B
#5975 ExecuteAsync netcoreapp3.1 1.52μs 0.412ns 1.6ns 0.0122 0 0 952 B
#5975 ExecuteAsync net472 1.8μs 1.27ns 4.92ns 0.145 0 0 915 B
Benchmarks.Trace.HttpClientBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendAsync net6.0 4.25μs 2.39ns 8.93ns 0.0298 0 0 2.22 KB
master SendAsync netcoreapp3.1 5.02μs 1.18ns 4.59ns 0.0376 0 0 2.76 KB
master SendAsync net472 7.92μs 2.58ns 10ns 0.496 0 0 3.15 KB
#5975 SendAsync net6.0 4.12μs 6.33ns 24.5ns 0.0305 0 0 2.22 KB
#5975 SendAsync netcoreapp3.1 5.06μs 1.98ns 7.68ns 0.0381 0 0 2.76 KB
#5975 SendAsync net472 7.85μs 1.41ns 5.45ns 0.496 0 0 3.15 KB
Benchmarks.Trace.ILoggerBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 1.5μs 0.651ns 2.52ns 0.0234 0 0 1.64 KB
master EnrichedLog netcoreapp3.1 2.28μs 0.98ns 3.53ns 0.0225 0 0 1.64 KB
master EnrichedLog net472 2.77μs 1.05ns 4.08ns 0.249 0 0 1.57 KB
#5975 EnrichedLog net6.0 1.59μs 1.03ns 3.71ns 0.023 0 0 1.64 KB
#5975 EnrichedLog netcoreapp3.1 2.3μs 0.618ns 2.14ns 0.0221 0 0 1.64 KB
#5975 EnrichedLog net472 2.69μs 1.91ns 7.38ns 0.249 0 0 1.57 KB
Benchmarks.Trace.Log4netBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 115μs 197ns 761ns 0.0577 0 0 4.28 KB
master EnrichedLog netcoreapp3.1 120μs 119ns 446ns 0 0 0 4.28 KB
master EnrichedLog net472 149μs 356ns 1.38μs 0.658 0.219 0 4.46 KB
#5975 EnrichedLog net6.0 114μs 143ns 556ns 0 0 0 4.28 KB
#5975 EnrichedLog netcoreapp3.1 119μs 211ns 817ns 0.0593 0 0 4.28 KB
#5975 EnrichedLog net472 147μs 146ns 567ns 0.661 0.22 0 4.46 KB
Benchmarks.Trace.NLogBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 2.95μs 0.546ns 1.97ns 0.0312 0 0 2.2 KB
master EnrichedLog netcoreapp3.1 4.08μs 1.31ns 4.91ns 0.0286 0 0 2.2 KB
master EnrichedLog net472 4.82μs 1.43ns 5.55ns 0.32 0 0 2.02 KB
#5975 EnrichedLog net6.0 3μs 0.842ns 3.26ns 0.0312 0 0 2.2 KB
#5975 EnrichedLog netcoreapp3.1 4.27μs 1.36ns 5.07ns 0.0282 0 0 2.2 KB
#5975 EnrichedLog net472 4.82μs 2.02ns 7.82ns 0.32 0 0 2.02 KB
Benchmarks.Trace.RedisBenchmark - Faster :tada: Same allocations :heavy_check_mark:

Faster :tada: in #5975

Benchmark base/diff Base Median (ns) Diff Median (ns) Modality
Benchmarks.Trace.RedisBenchmark.SendReceive‑net6.0 1.124 1,391.73 1,237.93

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendReceive net6.0 1.39μs 1.02ns 3.95ns 0.0159 0 0 1.14 KB
master SendReceive netcoreapp3.1 1.77μs 0.907ns 3.51ns 0.015 0 0 1.14 KB
master SendReceive net472 2.08μs 0.893ns 3.34ns 0.183 0.00104 0 1.16 KB
#5975 SendReceive net6.0 1.24μs 0.191ns 0.688ns 0.016 0 0 1.14 KB
#5975 SendReceive netcoreapp3.1 1.74μs 2.98ns 11.5ns 0.0155 0 0 1.14 KB
#5975 SendReceive net472 2.2μs 1.36ns 5.26ns 0.183 0 0 1.16 KB
Benchmarks.Trace.SerilogBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 2.87μs 2.37ns 8.55ns 0.0214 0 0 1.6 KB
master EnrichedLog netcoreapp3.1 3.86μs 3.85ns 14.9ns 0.0211 0 0 1.65 KB
master EnrichedLog net472 4.31μs 1.72ns 6.65ns 0.324 0 0 2.04 KB
#5975 EnrichedLog net6.0 2.77μs 0.908ns 3.52ns 0.0222 0 0 1.6 KB
#5975 EnrichedLog netcoreapp3.1 3.89μs 1.24ns 4.8ns 0.0231 0 0 1.65 KB
#5975 EnrichedLog net472 4.46μs 2.2ns 8.51ns 0.322 0 0 2.04 KB
Benchmarks.Trace.SpanBenchmark - Faster :tada: Same allocations :heavy_check_mark:

Faster :tada: in #5975

Benchmark base/diff Base Median (ns) Diff Median (ns) Modality
Benchmarks.Trace.SpanBenchmark.StartFinishSpan‑net6.0 1.157 465.01 401.85

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master StartFinishSpan net6.0 465ns 0.283ns 1.09ns 0.00815 0 0 576 B
master StartFinishSpan netcoreapp3.1 563ns 0.805ns 3.12ns 0.00764 0 0 576 B
master StartFinishSpan net472 639ns 0.718ns 2.78ns 0.0916 0 0 578 B
master StartFinishScope net6.0 479ns 0.368ns 1.43ns 0.00976 0 0 696 B
master StartFinishScope netcoreapp3.1 726ns 0.776ns 3ns 0.00933 0 0 696 B
master StartFinishScope net472 842ns 0.8ns 3.1ns 0.104 0 0 658 B
#5975 StartFinishSpan net6.0 401ns 0.277ns 1.07ns 0.00808 0 0 576 B
#5975 StartFinishSpan netcoreapp3.1 581ns 0.329ns 1.27ns 0.00784 0 0 576 B
#5975 StartFinishSpan net472 679ns 0.63ns 2.44ns 0.0915 0 0 578 B
#5975 StartFinishScope net6.0 473ns 0.678ns 2.45ns 0.00986 0 0 696 B
#5975 StartFinishScope netcoreapp3.1 745ns 0.389ns 1.45ns 0.00972 0 0 696 B
#5975 StartFinishScope net472 847ns 1ns 3.87ns 0.104 0 0 658 B
Benchmarks.Trace.TraceAnnotationsBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master RunOnMethodBegin net6.0 645ns 0.308ns 1.19ns 0.00999 0 0 696 B
master RunOnMethodBegin netcoreapp3.1 946ns 0.623ns 2.41ns 0.00939 0 0 696 B
master RunOnMethodBegin net472 1.07μs 0.767ns 2.97ns 0.104 0 0 658 B
#5975 RunOnMethodBegin net6.0 611ns 0.65ns 2.52ns 0.00976 0 0 696 B
#5975 RunOnMethodBegin netcoreapp3.1 986ns 1.26ns 4.88ns 0.00951 0 0 696 B
#5975 RunOnMethodBegin net472 1.13μs 0.526ns 2.04ns 0.105 0 0 658 B

andrewlock avatar Sep 04 '24 16:09 andrewlock

Throughput/Crank Report :zap:

Throughput results for AspNetCoreSimpleController comparing the following branches/commits:

Cases where throughput results for the PR are worse than latest master (5% drop or greater), results are shown in red.

Note that these results are based on a single point-in-time result for each branch. For full results, see one of the many, many dashboards!

gantt
    title Throughput Linux x64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (5975) (11.232M)   : 0, 11231934
    master (11.149M)   : 0, 11149005
    benchmarks/2.9.0 (11.197M)   : 0, 11196694

    section Automatic
    This PR (5975) (7.502M)   : 0, 7501877
    master (7.293M)   : 0, 7292565
    benchmarks/2.9.0 (7.764M)   : 0, 7763676

    section Trace stats
    master (7.789M)   : 0, 7788853

    section Manual
    master (11.289M)   : 0, 11289176

    section Manual + Automatic
    This PR (5975) (6.901M)   : 0, 6900514
    master (6.886M)   : 0, 6885775

    section DD_TRACE_ENABLED=0
    master (10.291M)   : 0, 10291337

gantt
    title Throughput Linux arm64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (5975) (9.553M)   : 0, 9553205
    benchmarks/2.9.0 (9.684M)   : 0, 9683707

    section Automatic
    This PR (5975) (6.498M)   : 0, 6497882

    section Manual + Automatic
    This PR (5975) (6.066M)   : 0, 6066383

gantt
    title Throughput Windows x64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (5975) (10.048M)   : 0, 10047828
    master (10.220M)   : 0, 10219514

    section Automatic
    This PR (5975) (6.869M)   : 0, 6869289
    master (6.882M)   : 0, 6881724

    section Trace stats
    master (7.404M)   : 0, 7403605

    section Manual
    master (10.253M)   : 0, 10252749

    section Manual + Automatic
    This PR (5975) (6.451M)   : 0, 6450568
    master (6.482M)   : 0, 6481560

    section DD_TRACE_ENABLED=0
    master (9.525M)   : 0, 9524930

andrewlock avatar Sep 04 '24 19:09 andrewlock