dd-trace-dotnet icon indicating copy to clipboard operation
dd-trace-dotnet copied to clipboard

[Tracer] Stats Computation: Disable stats computation and/or dropping P0 traces when the agent is not compatible

Open zacharycmontoya opened this issue 3 years ago • 1 comments

Summary of changes

When stats computation is enabled, the tracer sends a request to the trace agent's /info endpoint to ensure that this feature is supported by the trace agent. Based on the response, either the entire feature is disabled or only the dropping of P0 traces is disabled.

Based on top of https://github.com/DataDog/dd-trace-dotnet/pull/3048, this is PR 3/3 in an attempt to break down the massive PR https://github.com/DataDog/dd-trace-dotnet/pull/2988

Reason for change

Implementation details

StatsAggregator initialization now executes Endpoint Discovery

Background: When stats computation is enabled, the StatsAggregator object is responsible for buffering stats and periodically sending them to the trace agent's stats endpoint. Stats are computed by passing in an array of spans into the AddRange API, which is called by the AgentWriter when a local trace is finished and is buffered/serialized.

Much like the LiveDebugger initialization, the constructor of StatsAggregator will invoke a task on the threadpool to ping the agent URL at the /info endpoint to check what endpoints and features it supports. We specifically check for two properties:

  • endpoints[] contains "/v0.6/stats". The true/false value is set on StatsAggregator.CanComputeStats.
  • client_drop_p0s property is set to true. The true/false value is on StatsAggregator.CanDropP0s

Note: Since spans and traces can be finishing while the service discovery is ongoing, we let the StatsAggregator store stats points while the operation is ongoing. If initialization succeeds, the stats aggregrator can start sending stats to the trace agent. If initialization fails, the stats aggregator does not store new stats points and refuses to send stats to the trace agent.

Small changes

  • The DiscoveryService.Create method was refactored because the IConfigurationSource parameter was redundant.
  • The API of the IDiscoveryService interface has expanded to include a StatsEndpoint property and a ClientDropP0s property
  • The MockTracerAgent test class and all of its subclasses have been updated to accept an AgentConfiguration object which allows us to mock different agent capabilities. This was used to test scenarios where stats computation was allowed but not the dropping of P0 spans.

Test coverage

Adds integration tests via StatsTests

Other details

zacharycmontoya avatar Aug 03 '22 22:08 zacharycmontoya

Benchmarks Report :snail:

Benchmarks for #3049 compared to master:

  • 1 benchmarks are slower, with geometric mean 1.143
  • 2 benchmarks have more allocations

The following thresholds were used for comparing the benchmark speeds:

  • Mann–Whitney U test with statistical test for significance of 5%
  • Only results indicating a difference greater than 10% and 0.3 ns are considered.

Allocation changes below 0.5% are ignored.

Benchmark details

Benchmarks.Trace.AgentWriterBenchmark - Slower :warning: More allocations :warning:

Slower :warning: in #3049

Benchmark diff/base Base Median (ns) Diff Median (ns) Modality
Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces‑netcoreapp3.1 1.143 458,281.38 523,625.71

More allocations :warning: in #3049

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces‑net472 3.18 KB 6.78 KB 3.6 KB 113.37%
Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces‑netcoreapp3.1 2.58 KB 5.34 KB 2.75 KB 106.58%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master WriteAndFlushEnrichedTraces net472 719μs 673ns 2.52μs 0.359 0 0 3.18 KB
master WriteAndFlushEnrichedTraces netcoreapp3.1 458μs 220ns 824ns 0 0 0 2.58 KB
#3049 WriteAndFlushEnrichedTraces net472 763μs 490ns 1.9μs 0.762 0 0 6.78 KB
#3049 WriteAndFlushEnrichedTraces netcoreapp3.1 524μs 384ns 1.49μs 0 0 0 5.34 KB
Benchmarks.Trace.AppSecBodyBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master AllCycleSimpleBody net472 1.71μs 4.24ns 16.4ns 0.236 0 0 1.49 KB
master AllCycleSimpleBody netcoreapp3.1 1.87μs 1.95ns 7.3ns 0.0187 0 0 1.37 KB
master AllCycleMoreComplexBody net472 16.8μs 56.3ns 218ns 1.39 0.0249 0 8.75 KB
master AllCycleMoreComplexBody netcoreapp3.1 14.2μs 18.3ns 70.9ns 0.107 0 0 7.85 KB
master BodyExtractorSimpleBody net472 251ns 0.19ns 0.737ns 0.0574 0 0 361 B
master BodyExtractorSimpleBody netcoreapp3.1 226ns 0.427ns 1.6ns 0.0038 0 0 272 B
master BodyExtractorMoreComplexBody net472 14.6μs 9.35ns 33.7ns 1.21 0.0147 0 7.62 KB
master BodyExtractorMoreComplexBody netcoreapp3.1 12.2μs 12.1ns 46.8ns 0.0905 0 0 6.75 KB
#3049 AllCycleSimpleBody net472 1.68μs 1.38ns 5.18ns 0.237 0 0 1.49 KB
#3049 AllCycleSimpleBody netcoreapp3.1 1.81μs 2.55ns 9.87ns 0.0187 0 0 1.37 KB
#3049 AllCycleMoreComplexBody net472 16.9μs 6.42ns 24ns 1.39 0.0252 0 8.75 KB
#3049 AllCycleMoreComplexBody netcoreapp3.1 14.3μs 12.6ns 47.2ns 0.106 0 0 7.85 KB
#3049 BodyExtractorSimpleBody net472 263ns 0.881ns 3.41ns 0.0574 0 0 361 B
#3049 BodyExtractorSimpleBody netcoreapp3.1 221ns 0.2ns 0.748ns 0.00381 0 0 272 B
#3049 BodyExtractorMoreComplexBody net472 14.8μs 10.2ns 39.6ns 1.21 0.0148 0 7.62 KB
#3049 BodyExtractorMoreComplexBody netcoreapp3.1 12.1μs 18ns 69.5ns 0.0911 0 0 6.75 KB
Benchmarks.Trace.AspNetCoreBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendRequest net472 0ns 0ns 0ns 0 0 0 0 b
master SendRequest netcoreapp3.1 181μs 166ns 642ns 0.181 0 0 20.33 KB
#3049 SendRequest net472 0ns 0ns 0ns 0 0 0 0 b
#3049 SendRequest netcoreapp3.1 178μs 105ns 405ns 0.268 0 0 20.33 KB
Benchmarks.Trace.DbCommandBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master ExecuteNonQuery net472 1.51μs 0.67ns 2.59ns 0.126 0.000761 0 794 B
master ExecuteNonQuery netcoreapp3.1 1.27μs 0.378ns 1.47ns 0.0114 0 0 824 B
#3049 ExecuteNonQuery net472 1.59μs 0.52ns 1.94ns 0.126 0.000804 0 794 B
#3049 ExecuteNonQuery netcoreapp3.1 1.32μs 0.938ns 3.51ns 0.011 0 0 824 B
Benchmarks.Trace.ElasticsearchBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master CallElasticsearch net472 2.12μs 1.05ns 4.06ns 0.159 0 0 1 KB
master CallElasticsearch netcoreapp3.1 1.41μs 0.982ns 3.67ns 0.0134 0 0 984 B
master CallElasticsearchAsync net472 2.45μs 0.989ns 3.83ns 0.181 0 0 1.14 KB
master CallElasticsearchAsync netcoreapp3.1 1.48μs 0.503ns 1.88ns 0.0148 0 0 1.1 KB
#3049 CallElasticsearch net472 2.22μs 0.689ns 2.67ns 0.159 0 0 1 KB
#3049 CallElasticsearch netcoreapp3.1 1.41μs 0.503ns 1.95ns 0.0132 0 0 984 B
#3049 CallElasticsearchAsync net472 2.39μs 0.435ns 1.69ns 0.181 0 0 1.14 KB
#3049 CallElasticsearchAsync netcoreapp3.1 1.53μs 0.652ns 2.44ns 0.0147 0 0 1.1 KB
Benchmarks.Trace.GraphQLBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master ExecuteAsync net472 2.39μs 5.36ns 20ns 0.2 0 0 1.26 KB
master ExecuteAsync netcoreapp3.1 1.63μs 2.21ns 8.28ns 0.0164 0 0 1.22 KB
#3049 ExecuteAsync net472 2.47μs 4.45ns 17.3ns 0.2 0 0 1.26 KB
#3049 ExecuteAsync netcoreapp3.1 1.58μs 1.05ns 3.79ns 0.0159 0 0 1.22 KB
Benchmarks.Trace.HttpClientBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendAsync net472 4.92μs 5.04ns 19.5ns 0.392 0 0 2.48 KB
master SendAsync netcoreapp3.1 3.22μs 2.52ns 9.43ns 0.032 0 0 2.36 KB
#3049 SendAsync net472 5.05μs 2.33ns 9.02ns 0.392 0 0 2.48 KB
#3049 SendAsync netcoreapp3.1 3.23μs 2.78ns 10.8ns 0.0321 0 0 2.36 KB
Benchmarks.Trace.ILoggerBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net472 2.73μs 0.783ns 2.82ns 0.263 0 0 1.66 KB
master EnrichedLog netcoreapp3.1 2.36μs 0.97ns 3.76ns 0.0238 0 0 1.73 KB
#3049 EnrichedLog net472 2.88μs 1.27ns 4.91ns 0.263 0 0 1.66 KB
#3049 EnrichedLog netcoreapp3.1 2.37μs 0.876ns 3.28ns 0.0237 0 0 1.73 KB
Benchmarks.Trace.Log4netBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net472 146μs 114ns 440ns 0.659 0.22 0 4.5 KB
master EnrichedLog netcoreapp3.1 112μs 144ns 557ns 0.0562 0 0 4.38 KB
#3049 EnrichedLog net472 145μs 64.3ns 241ns 0.656 0.219 0 4.5 KB
#3049 EnrichedLog netcoreapp3.1 112μs 79.8ns 309ns 0.0565 0 0 4.38 KB
Benchmarks.Trace.NLogBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net472 5.42μs 5.29ns 19.8ns 0.543 0.0027 0 3.43 KB
master EnrichedLog netcoreapp3.1 4.39μs 2.26ns 8.77ns 0.0525 0 0 3.8 KB
#3049 EnrichedLog net472 5.24μs 3.71ns 14.4ns 0.545 0.00262 0 3.43 KB
#3049 EnrichedLog netcoreapp3.1 4.37μs 2.34ns 8.74ns 0.0523 0 0 3.8 KB
Benchmarks.Trace.RedisBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendReceive net472 2.02μs 1.61ns 6.25ns 0.194 0 0 1.22 KB
master SendReceive netcoreapp3.1 1.72μs 0.607ns 2.27ns 0.0163 0 0 1.21 KB
#3049 SendReceive net472 2μs 0.85ns 3.18ns 0.193 0 0 1.22 KB
#3049 SendReceive netcoreapp3.1 1.69μs 0.611ns 2.2ns 0.0162 0 0 1.21 KB
Benchmarks.Trace.SerilogBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net472 4.63μs 1.88ns 7.05ns 0.328 0 0 2.08 KB
master EnrichedLog netcoreapp3.1 4.22μs 1.37ns 5.32ns 0.0232 0 0 1.69 KB
#3049 EnrichedLog net472 4.65μs 1.3ns 5.03ns 0.33 0 0 2.08 KB
#3049 EnrichedLog netcoreapp3.1 4.15μs 1.43ns 5.56ns 0.0228 0 0 1.69 KB
Benchmarks.Trace.SpanBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master StartFinishSpan net472 849ns 0.263ns 0.985ns 0.105 0 0 658 B
master StartFinishSpan netcoreapp3.1 762ns 0.226ns 0.875ns 0.00877 0 0 648 B
master StartFinishScope net472 1.09μs 0.281ns 1.05ns 0.117 0 0 738 B
master StartFinishScope netcoreapp3.1 888ns 0.66ns 2.47ns 0.0101 0 0 768 B
#3049 StartFinishSpan net472 859ns 0.137ns 0.493ns 0.104 0 0 658 B
#3049 StartFinishSpan netcoreapp3.1 765ns 0.25ns 0.969ns 0.00856 0 0 648 B
#3049 StartFinishScope net472 1.06μs 0.61ns 2.36ns 0.117 0 0 738 B
#3049 StartFinishScope netcoreapp3.1 909ns 0.416ns 1.56ns 0.0103 0 0 768 B
Benchmarks.Trace.TraceAnnotationsBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master RunOnMethodBegin net472 1.19μs 0.319ns 1.23ns 0.117 0 0 738 B
master RunOnMethodBegin netcoreapp3.1 1.01μs 0.491ns 1.84ns 0.0101 0 0 768 B
#3049 RunOnMethodBegin net472 1.24μs 0.272ns 0.979ns 0.117 0 0 738 B
#3049 RunOnMethodBegin netcoreapp3.1 1.03μs 1.38ns 5.15ns 0.0105 0 0 768 B

andrewlock avatar Aug 03 '22 23:08 andrewlock

Since there were a lot of changes in StatsFeature and the DiscoveryService, I created a new PR to implement the feature: #3152

zacharycmontoya avatar Sep 01 '22 22:09 zacharycmontoya