dd-trace-dotnet icon indicating copy to clipboard operation
dd-trace-dotnet copied to clipboard

Guard from out of memory during probe processing

Open dudikeleti opened this issue 1 year ago • 1 comments

Summary of changes

There are cases that we reach out of memory during snapshot creation and we want to avoid it

Implementation details

We are using object pool and circuit memory breaker

dudikeleti avatar Oct 01 '24 12:10 dudikeleti

Datadog Report

Branch report: duidk/oom-exception Commit report: 9d1f871 Test service: dd-trace-dotnet

:white_check_mark: 0 Failed, 32284 Passed, 1606 Skipped, 24h 10m 5.67s Total Time :hourglass: 1 Performance Regression

:hourglass: Performance Regressions vs Default Branch (1)

  • Profiler_liveheap_cpu_walltime - scenarios 4.55s (+200.67ms, +5%) - Details

datadog-ddstaging[bot] avatar Oct 01 '24 12:10 datadog-ddstaging[bot]

Execution-Time Benchmarks Report :stopwatch:

Execution-time results for samples comparing the following branches/commits:

Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:

  • Welch test with statistical test for significance of 5%
  • Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6105) - mean (70ms)  : 68, 72
     .   : milestone, 70,
    master - mean (70ms)  : 68, 71
     .   : milestone, 70,

    section CallTarget+Inlining+NGEN
    This PR (6105) - mean (1,111ms)  : 1089, 1132
     .   : milestone, 1111,
    master - mean (1,112ms)  : 1091, 1133
     .   : milestone, 1112,

gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6105) - mean (108ms)  : 106, 111
     .   : milestone, 108,
    master - mean (108ms)  : 106, 111
     .   : milestone, 108,

    section CallTarget+Inlining+NGEN
    This PR (6105) - mean (772ms)  : 759, 786
     .   : milestone, 772,
    master - mean (773ms)  : 760, 786
     .   : milestone, 773,

gantt
    title Execution time (ms) FakeDbCommand (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6105) - mean (92ms)  : 90, 93
     .   : milestone, 92,
    master - mean (92ms)  : 90, 94
     .   : milestone, 92,

    section CallTarget+Inlining+NGEN
    This PR (6105) - mean (728ms)  : 713, 744
     .   : milestone, 728,
    master - mean (733ms)  : 712, 753
     .   : milestone, 733,

gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6105) - mean (191ms)  : 188, 194
     .   : milestone, 191,
    master - mean (191ms)  : 186, 196
     .   : milestone, 191,

    section CallTarget+Inlining+NGEN
    This PR (6105) - mean (1,236ms)  : 1212, 1260
     .   : milestone, 1236,
    master - mean (1,225ms)  : 1193, 1257
     .   : milestone, 1225,

gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6105) - mean (277ms)  : 271, 282
     .   : milestone, 277,
    master - mean (277ms)  : 273, 281
     .   : milestone, 277,

    section CallTarget+Inlining+NGEN
    This PR (6105) - mean (950ms)  : 934, 965
     .   : milestone, 950,
    master - mean (955ms)  : 936, 974
     .   : milestone, 955,

gantt
    title Execution time (ms) HttpMessageHandler (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6105) - mean (266ms)  : 261, 270
     .   : milestone, 266,
    master - mean (267ms)  : 261, 272
     .   : milestone, 267,

    section CallTarget+Inlining+NGEN
    This PR (6105) - mean (936ms)  : 912, 959
     .   : milestone, 936,
    master - mean (937ms)  : 916, 959
     .   : milestone, 937,

andrewlock avatar Oct 31 '24 17:10 andrewlock

Benchmarks Report for tracer :snail:

Benchmarks for #6105 compared to master:

  • 1 benchmarks are faster, with geometric mean 1.199
  • 2 benchmarks are slower, with geometric mean 1.170
  • 1 benchmarks have fewer allocations

The following thresholds were used for comparing the benchmark speeds:

  • Mann–Whitney U test with statistical test for significance of 5%
  • Only results indicating a difference greater than 10% and 0.3 ns are considered.

Allocation changes below 0.5% are ignored.

Benchmark details

Benchmarks.Trace.ActivityBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master StartStopWithChild net6.0 7.91μs 43.4ns 344ns 0.0156 0.00781 0 5.6 KB
master StartStopWithChild netcoreapp3.1 10.1μs 47.2ns 254ns 0.0204 0.0102 0 5.8 KB
master StartStopWithChild net472 16.5μs 66.3ns 257ns 1.03 0.299 0.0888 6.22 KB
#6105 StartStopWithChild net6.0 8.22μs 45.8ns 290ns 0.0162 0.00405 0 5.62 KB
#6105 StartStopWithChild netcoreapp3.1 9.8μs 53ns 295ns 0.0193 0.00964 0 5.8 KB
#6105 StartStopWithChild net472 16.4μs 39.2ns 152ns 1.05 0.319 0.0899 6.21 KB
Benchmarks.Trace.AgentWriterBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master WriteAndFlushEnrichedTraces net6.0 484μs 206ns 770ns 0 0 0 2.7 KB
master WriteAndFlushEnrichedTraces netcoreapp3.1 641μs 302ns 1.13μs 0 0 0 2.7 KB
master WriteAndFlushEnrichedTraces net472 838μs 615ns 2.38μs 0.417 0 0 3.3 KB
#6105 WriteAndFlushEnrichedTraces net6.0 485μs 348ns 1.35μs 0 0 0 2.7 KB
#6105 WriteAndFlushEnrichedTraces netcoreapp3.1 639μs 211ns 761ns 0 0 0 2.7 KB
#6105 WriteAndFlushEnrichedTraces net472 831μs 478ns 1.79μs 0.414 0 0 3.3 KB
Benchmarks.Trace.AspNetCoreBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendRequest net6.0 204μs 1.16μs 8.85μs 0.202 0 0 18.73 KB
master SendRequest netcoreapp3.1 225μs 1.29μs 10.7μs 0.222 0 0 20.89 KB
master SendRequest net472 0.000142ns 9.84E‑05ns 0.000355ns 0 0 0 0 b
#6105 SendRequest net6.0 201μs 1.18μs 11.2μs 0.203 0 0 18.73 KB
#6105 SendRequest netcoreapp3.1 218μs 1.19μs 9.12μs 0.205 0 0 20.89 KB
#6105 SendRequest net472 0.00105ns 0.000391ns 0.00146ns 0 0 0 0 b
Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark - Same speed :heavy_check_mark: Fewer allocations :tada:

Fewer allocations :tada: in #6105

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces‑net6.0 41.84 KB 41.62 KB -221 B -0.53%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master WriteAndFlushEnrichedTraces net6.0 583μs 3.26μs 21.4μs 0.561 0 0 41.84 KB
master WriteAndFlushEnrichedTraces netcoreapp3.1 695μs 3.54μs 16.2μs 0.34 0 0 41.83 KB
master WriteAndFlushEnrichedTraces net472 886μs 4.42μs 20.7μs 8.48 2.23 0.446 53.29 KB
#6105 WriteAndFlushEnrichedTraces net6.0 566μs 2.8μs 11.9μs 0.543 0 0 41.62 KB
#6105 WriteAndFlushEnrichedTraces netcoreapp3.1 680μs 3.33μs 13.7μs 0.336 0 0 41.74 KB
#6105 WriteAndFlushEnrichedTraces net472 848μs 3.52μs 13.6μs 8.25 2.6 0.434 53.29 KB
Benchmarks.Trace.DbCommandBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master ExecuteNonQuery net6.0 1.35μs 1.33ns 4.6ns 0.0145 0 0 1.02 KB
master ExecuteNonQuery netcoreapp3.1 1.88μs 1.81ns 7.01ns 0.0133 0 0 1.02 KB
master ExecuteNonQuery net472 2.12μs 2.61ns 10.1ns 0.156 0.00106 0 987 B
#6105 ExecuteNonQuery net6.0 1.3μs 0.986ns 3.82ns 0.0143 0 0 1.02 KB
#6105 ExecuteNonQuery netcoreapp3.1 1.74μs 1.39ns 5.37ns 0.0138 0 0 1.02 KB
#6105 ExecuteNonQuery net472 2.11μs 3.03ns 11.7ns 0.156 0.00106 0 987 B
Benchmarks.Trace.ElasticsearchBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master CallElasticsearch net6.0 1.22μs 0.387ns 1.5ns 0.014 0 0 976 B
master CallElasticsearch netcoreapp3.1 1.56μs 0.533ns 1.85ns 0.0132 0 0 976 B
master CallElasticsearch net472 2.49μs 0.974ns 3.51ns 0.158 0 0 995 B
master CallElasticsearchAsync net6.0 1.24μs 0.783ns 3.03ns 0.0131 0 0 952 B
master CallElasticsearchAsync netcoreapp3.1 1.67μs 0.821ns 2.96ns 0.0133 0 0 1.02 KB
master CallElasticsearchAsync net472 2.59μs 0.649ns 2.43ns 0.167 0 0 1.05 KB
#6105 CallElasticsearch net6.0 1.33μs 0.61ns 2.28ns 0.014 0 0 976 B
#6105 CallElasticsearch netcoreapp3.1 1.58μs 0.746ns 2.89ns 0.0128 0 0 976 B
#6105 CallElasticsearch net472 2.51μs 0.845ns 3.16ns 0.157 0 0 995 B
#6105 CallElasticsearchAsync net6.0 1.23μs 0.782ns 2.93ns 0.0135 0 0 952 B
#6105 CallElasticsearchAsync netcoreapp3.1 1.61μs 0.455ns 1.76ns 0.0137 0 0 1.02 KB
#6105 CallElasticsearchAsync net472 2.63μs 0.678ns 2.62ns 0.167 0 0 1.05 KB
Benchmarks.Trace.GraphQLBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master ExecuteAsync net6.0 1.24μs 4.63ns 17.9ns 0.0133 0 0 952 B
master ExecuteAsync netcoreapp3.1 1.61μs 1.33ns 4.97ns 0.0121 0 0 952 B
master ExecuteAsync net472 1.76μs 0.468ns 1.75ns 0.145 0 0 915 B
#6105 ExecuteAsync net6.0 1.34μs 1.38ns 5.36ns 0.0128 0 0 952 B
#6105 ExecuteAsync netcoreapp3.1 1.61μs 1.19ns 4.47ns 0.0129 0 0 952 B
#6105 ExecuteAsync net472 1.78μs 0.339ns 1.22ns 0.145 0 0 915 B
Benchmarks.Trace.HttpClientBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendAsync net6.0 4.35μs 2.27ns 8.79ns 0.0326 0 0 2.31 KB
master SendAsync netcoreapp3.1 5.23μs 2.36ns 8.52ns 0.0366 0 0 2.85 KB
master SendAsync net472 7.43μs 2.13ns 8.26ns 0.493 0 0 3.12 KB
#6105 SendAsync net6.0 4.43μs 1.64ns 5.7ns 0.031 0 0 2.31 KB
#6105 SendAsync netcoreapp3.1 5.37μs 1.55ns 5.8ns 0.0372 0 0 2.85 KB
#6105 SendAsync net472 7.55μs 1.69ns 6.55ns 0.494 0 0 3.12 KB
Benchmarks.Trace.ILoggerBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 1.43μs 1.43ns 5.36ns 0.0231 0 0 1.64 KB
master EnrichedLog netcoreapp3.1 2.3μs 1.4ns 5.43ns 0.023 0 0 1.64 KB
master EnrichedLog net472 2.51μs 0.725ns 2.81ns 0.249 0 0 1.57 KB
#6105 EnrichedLog net6.0 1.54μs 0.676ns 2.44ns 0.023 0 0 1.64 KB
#6105 EnrichedLog netcoreapp3.1 2.36μs 1.38ns 5.17ns 0.0223 0 0 1.64 KB
#6105 EnrichedLog net472 2.53μs 1.3ns 5.03ns 0.249 0 0 1.57 KB
Benchmarks.Trace.Log4netBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 118μs 266ns 1.03μs 0.0588 0 0 4.28 KB
master EnrichedLog netcoreapp3.1 122μs 273ns 1.06μs 0.061 0 0 4.28 KB
master EnrichedLog net472 153μs 189ns 733ns 0.691 0.23 0 4.46 KB
#6105 EnrichedLog net6.0 120μs 118ns 458ns 0 0 0 4.28 KB
#6105 EnrichedLog netcoreapp3.1 123μs 190ns 737ns 0 0 0 4.28 KB
#6105 EnrichedLog net472 155μs 165ns 639ns 0.698 0.233 0 4.46 KB
Benchmarks.Trace.NLogBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 3.07μs 10.6ns 41.2ns 0.0306 0 0 2.2 KB
master EnrichedLog netcoreapp3.1 4.09μs 1.12ns 4.34ns 0.0306 0 0 2.2 KB
master EnrichedLog net472 4.97μs 1.36ns 5.27ns 0.32 0 0 2.02 KB
#6105 EnrichedLog net6.0 3.18μs 0.997ns 3.73ns 0.0302 0 0 2.2 KB
#6105 EnrichedLog netcoreapp3.1 4.33μs 0.601ns 2.25ns 0.0283 0 0 2.2 KB
#6105 EnrichedLog net472 4.95μs 1.18ns 4.58ns 0.32 0 0 2.02 KB
Benchmarks.Trace.RedisBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendReceive net6.0 1.41μs 1.49ns 5.76ns 0.0162 0 0 1.14 KB
master SendReceive netcoreapp3.1 1.7μs 0.581ns 2.09ns 0.0153 0 0 1.14 KB
master SendReceive net472 2.07μs 1.37ns 5.29ns 0.183 0 0 1.16 KB
#6105 SendReceive net6.0 1.33μs 0.786ns 3.04ns 0.016 0 0 1.14 KB
#6105 SendReceive netcoreapp3.1 1.72μs 0.657ns 2.46ns 0.0154 0 0 1.14 KB
#6105 SendReceive net472 2μs 1.21ns 4.53ns 0.183 0 0 1.16 KB
Benchmarks.Trace.SerilogBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 2.64μs 1.6ns 6ns 0.0224 0 0 1.6 KB
master EnrichedLog netcoreapp3.1 3.97μs 1ns 3.75ns 0.0218 0 0 1.65 KB
master EnrichedLog net472 4.39μs 1.61ns 6.03ns 0.322 0 0 2.04 KB
#6105 EnrichedLog net6.0 2.73μs 0.753ns 2.82ns 0.0218 0 0 1.6 KB
#6105 EnrichedLog netcoreapp3.1 3.89μs 1.76ns 6.59ns 0.0213 0 0 1.65 KB
#6105 EnrichedLog net472 4.21μs 3.18ns 11.9ns 0.323 0 0 2.04 KB
Benchmarks.Trace.SpanBenchmark - Slower :warning: Same allocations :heavy_check_mark:

Slower :warning: in #6105

Benchmark diff/base Base Median (ns) Diff Median (ns) Modality
Benchmarks.Trace.SpanBenchmark.StartFinishSpan‑netcoreapp3.1 1.129 545.83 616.31

Faster :tada: in #6105

Benchmark base/diff Base Median (ns) Diff Median (ns) Modality
Benchmarks.Trace.SpanBenchmark.StartFinishSpan‑net472 1.199 774.57 646.15

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master StartFinishSpan net6.0 403ns 0.218ns 0.843ns 0.00809 0 0 576 B
master StartFinishSpan netcoreapp3.1 546ns 0.322ns 1.25ns 0.00788 0 0 576 B
master StartFinishSpan net472 775ns 0.826ns 3.2ns 0.0918 0 0 578 B
master StartFinishScope net6.0 487ns 0.419ns 1.57ns 0.00976 0 0 696 B
master StartFinishScope netcoreapp3.1 704ns 0.852ns 3.3ns 0.00926 0 0 696 B
master StartFinishScope net472 898ns 1.22ns 4.73ns 0.104 0 0 658 B
#6105 StartFinishSpan net6.0 395ns 0.334ns 1.29ns 0.00818 0 0 576 B
#6105 StartFinishSpan netcoreapp3.1 616ns 0.234ns 0.905ns 0.00764 0 0 576 B
#6105 StartFinishSpan net472 646ns 1.03ns 3.98ns 0.0916 0 0 578 B
#6105 StartFinishScope net6.0 475ns 0.35ns 1.36ns 0.00969 0 0 696 B
#6105 StartFinishScope netcoreapp3.1 668ns 0.324ns 1.26ns 0.0091 0 0 696 B
#6105 StartFinishScope net472 943ns 0.679ns 2.63ns 0.105 0 0 658 B
Benchmarks.Trace.TraceAnnotationsBenchmark - Slower :warning: Same allocations :heavy_check_mark:

Slower :warning: in #6105

Benchmark diff/base Base Median (ns) Diff Median (ns) Modality
Benchmarks.Trace.TraceAnnotationsBenchmark.RunOnMethodBegin‑net6.0 1.213 592.90 719.09

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master RunOnMethodBegin net6.0 593ns 0.508ns 1.97ns 0.00986 0 0 696 B
master RunOnMethodBegin netcoreapp3.1 972ns 0.904ns 3.5ns 0.00947 0 0 696 B
master RunOnMethodBegin net472 1.17μs 1.3ns 5.02ns 0.104 0 0 658 B
#6105 RunOnMethodBegin net6.0 719ns 0.443ns 1.72ns 0.00975 0 0 696 B
#6105 RunOnMethodBegin netcoreapp3.1 881ns 0.997ns 3.86ns 0.00927 0 0 696 B
#6105 RunOnMethodBegin net472 1.16μs 1.54ns 5.97ns 0.104 0 0 658 B

andrewlock avatar Oct 31 '24 21:10 andrewlock

Throughput/Crank Report :zap:

Throughput results for AspNetCoreSimpleController comparing the following branches/commits:

Cases where throughput results for the PR are worse than latest master (5% drop or greater), results are shown in red.

Note that these results are based on a single point-in-time result for each branch. For full results, see one of the many, many dashboards!

gantt
    title Throughput Linux x64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (6105) (11.196M)   : 0, 11196237
    master (11.131M)   : 0, 11130779
    benchmarks/2.9.0 (11.033M)   : 0, 11032866

    section Automatic
    This PR (6105) (7.297M)   : 0, 7297462
    master (7.310M)   : 0, 7310157
    benchmarks/2.9.0 (7.786M)   : 0, 7785853

    section Trace stats
    master (7.599M)   : 0, 7598556

    section Manual
    master (11.073M)   : 0, 11072548

    section Manual + Automatic
    This PR (6105) (6.761M)   : 0, 6760939
    master (6.775M)   : 0, 6775349

    section DD_TRACE_ENABLED=0
    master (10.251M)   : 0, 10251396

gantt
    title Throughput Linux arm64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (6105) (9.518M)   : 0, 9518159
    master (9.713M)   : 0, 9712554
    benchmarks/2.9.0 (9.495M)   : 0, 9494821

    section Automatic
    This PR (6105) (6.462M)   : 0, 6462009
    master (6.514M)   : 0, 6513986

    section Trace stats
    master (6.710M)   : 0, 6710317

    section Manual
    master (9.670M)   : 0, 9669749

    section Manual + Automatic
    This PR (6105) (5.958M)   : 0, 5958088
    master (6.015M)   : 0, 6015065

    section DD_TRACE_ENABLED=0
    master (8.772M)   : 0, 8772254

gantt
    title Throughput Windows x64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (6105) (9.954M)   : 0, 9954144
    master (10.153M)   : 0, 10152752
    benchmarks/2.9.0 (10.020M)   : 0, 10019592

    section Automatic
    This PR (6105) (6.623M)   : 0, 6623416
    master (6.377M)   : 0, 6376641
    benchmarks/2.9.0 (7.255M)   : 0, 7255257

    section Trace stats
    master (7.126M)   : 0, 7126124

    section Manual
    master (10.105M)   : 0, 10104968

    section Manual + Automatic
    This PR (6105) (6.182M)   : 0, 6182248
    master (5.901M)   : 0, 5901347

    section DD_TRACE_ENABLED=0
    master (9.225M)   : 0, 9225046

andrewlock avatar Nov 07 '24 21:11 andrewlock