dd-trace-dotnet [Tracer] Stats Computation: Disable stats computation and/or dropping P0 traces when the agent is not compatible

[Tracer] Stats Computation: Disable stats computation and/or dropping P0 traces when the agent is not compatible

Open zacharycmontoya opened this issue 3 years ago • 1 comments

Summary of changes

When stats computation is enabled, the tracer sends a request to the trace agent's /info endpoint to ensure that this feature is supported by the trace agent. Based on the response, either the entire feature is disabled or only the dropping of P0 traces is disabled.

Based on top of https://github.com/DataDog/dd-trace-dotnet/pull/3048, this is PR 3/3 in an attempt to break down the massive PR https://github.com/DataDog/dd-trace-dotnet/pull/2988

Reason for change

Implementation details

StatsAggregator initialization now executes Endpoint Discovery

Background: When stats computation is enabled, the StatsAggregator object is responsible for buffering stats and periodically sending them to the trace agent's stats endpoint. Stats are computed by passing in an array of spans into the AddRange API, which is called by the AgentWriter when a local trace is finished and is buffered/serialized.

Much like the LiveDebugger initialization, the constructor of StatsAggregator will invoke a task on the threadpool to ping the agent URL at the /info endpoint to check what endpoints and features it supports. We specifically check for two properties:

endpoints[] contains "/v0.6/stats". The true/false value is set on StatsAggregator.CanComputeStats.
client_drop_p0s property is set to true. The true/false value is on StatsAggregator.CanDropP0s

Note: Since spans and traces can be finishing while the service discovery is ongoing, we let the StatsAggregator store stats points while the operation is ongoing. If initialization succeeds, the stats aggregrator can start sending stats to the trace agent. If initialization fails, the stats aggregator does not store new stats points and refuses to send stats to the trace agent.

Small changes

The DiscoveryService.Create method was refactored because the IConfigurationSource parameter was redundant.
The API of the IDiscoveryService interface has expanded to include a StatsEndpoint property and a ClientDropP0s property
The MockTracerAgent test class and all of its subclasses have been updated to accept an AgentConfiguration object which allows us to mock different agent capabilities. This was used to test scenarios where stats computation was allowed but not the dropping of P0 spans.

Test coverage

Adds integration tests via StatsTests

Other details

Aug 03 '22 22:08 zacharycmontoya

Benchmarks Report :snail:

Benchmarks for #3049 compared to master:

1 benchmarks are slower, with geometric mean 1.143
2 benchmarks have more allocations

The following thresholds were used for comparing the benchmark speeds:

Mann–Whitney U test with statistical test for significance of 5%
Only results indicating a difference greater than 10% and 0.3 ns are considered.

Allocation changes below 0.5% are ignored.

Benchmark details

Benchmarks.Trace.AgentWriterBenchmark - Slower :warning: More allocations :warning:

Slower :warning: in #3049

Benchmark	diff/base	Base Median (ns)	Diff Median (ns)	Modality
Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces‑netcoreapp3.1	1.143	458,281.38	523,625.71

More allocations :warning: in #3049

Benchmark	Base Allocated	Diff Allocated	Change	Change %
Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces‑net472	3.18 KB	6.78 KB	3.6 KB	113.37%
Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces‑netcoreapp3.1	2.58 KB	5.34 KB	2.75 KB	106.58%

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`WriteAndFlushEnrichedTraces`	net472	719μs	673ns	2.52μs	0.359	3.18 KB
master	`WriteAndFlushEnrichedTraces`	netcoreapp3.1	458μs	220ns	824ns	0	2.58 KB
#3049	`WriteAndFlushEnrichedTraces`	net472	763μs	490ns	1.9μs	0.762	6.78 KB
#3049	`WriteAndFlushEnrichedTraces`	netcoreapp3.1	524μs	384ns	1.49μs	0	5.34 KB

Benchmarks.Trace.AppSecBodyBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Allocated
master	`AllCycleSimpleBody`	net472	1.71μs	4.24ns	16.4ns	0.236	0	1.49 KB
master	`AllCycleSimpleBody`	netcoreapp3.1	1.87μs	1.95ns	7.3ns	0.0187	0	1.37 KB
master	`AllCycleMoreComplexBody`	net472	16.8μs	56.3ns	218ns	1.39	0.0249	8.75 KB
master	`AllCycleMoreComplexBody`	netcoreapp3.1	14.2μs	18.3ns	70.9ns	0.107	0	7.85 KB
master	`BodyExtractorSimpleBody`	net472	251ns	0.19ns	0.737ns	0.0574	0	361 B
master	`BodyExtractorSimpleBody`	netcoreapp3.1	226ns	0.427ns	1.6ns	0.0038	0	272 B
master	`BodyExtractorMoreComplexBody`	net472	14.6μs	9.35ns	33.7ns	1.21	0.0147	7.62 KB
master	`BodyExtractorMoreComplexBody`	netcoreapp3.1	12.2μs	12.1ns	46.8ns	0.0905	0	6.75 KB
#3049	`AllCycleSimpleBody`	net472	1.68μs	1.38ns	5.18ns	0.237	0	1.49 KB
#3049	`AllCycleSimpleBody`	netcoreapp3.1	1.81μs	2.55ns	9.87ns	0.0187	0	1.37 KB
#3049	`AllCycleMoreComplexBody`	net472	16.9μs	6.42ns	24ns	1.39	0.0252	8.75 KB
#3049	`AllCycleMoreComplexBody`	netcoreapp3.1	14.3μs	12.6ns	47.2ns	0.106	0	7.85 KB
#3049	`BodyExtractorSimpleBody`	net472	263ns	0.881ns	3.41ns	0.0574	0	361 B
#3049	`BodyExtractorSimpleBody`	netcoreapp3.1	221ns	0.2ns	0.748ns	0.00381	0	272 B
#3049	`BodyExtractorMoreComplexBody`	net472	14.8μs	10.2ns	39.6ns	1.21	0.0148	7.62 KB
#3049	`BodyExtractorMoreComplexBody`	netcoreapp3.1	12.1μs	18ns	69.5ns	0.0911	0	6.75 KB

Benchmarks.Trace.AspNetCoreBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`SendRequest`	net472	0ns	0ns	0ns	0	0 b
master	`SendRequest`	netcoreapp3.1	181μs	166ns	642ns	0.181	20.33 KB
#3049	`SendRequest`	net472	0ns	0ns	0ns	0	0 b
#3049	`SendRequest`	netcoreapp3.1	178μs	105ns	405ns	0.268	20.33 KB

Benchmarks.Trace.DbCommandBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Allocated
master	`ExecuteNonQuery`	net472	1.51μs	0.67ns	2.59ns	0.126	0.000761	794 B
master	`ExecuteNonQuery`	netcoreapp3.1	1.27μs	0.378ns	1.47ns	0.0114	0	824 B
#3049	`ExecuteNonQuery`	net472	1.59μs	0.52ns	1.94ns	0.126	0.000804	794 B
#3049	`ExecuteNonQuery`	netcoreapp3.1	1.32μs	0.938ns	3.51ns	0.011	0	824 B

Benchmarks.Trace.ElasticsearchBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`CallElasticsearch`	net472	2.12μs	1.05ns	4.06ns	0.159	1 KB
master	`CallElasticsearch`	netcoreapp3.1	1.41μs	0.982ns	3.67ns	0.0134	984 B
master	`CallElasticsearchAsync`	net472	2.45μs	0.989ns	3.83ns	0.181	1.14 KB
master	`CallElasticsearchAsync`	netcoreapp3.1	1.48μs	0.503ns	1.88ns	0.0148	1.1 KB
#3049	`CallElasticsearch`	net472	2.22μs	0.689ns	2.67ns	0.159	1 KB
#3049	`CallElasticsearch`	netcoreapp3.1	1.41μs	0.503ns	1.95ns	0.0132	984 B
#3049	`CallElasticsearchAsync`	net472	2.39μs	0.435ns	1.69ns	0.181	1.14 KB
#3049	`CallElasticsearchAsync`	netcoreapp3.1	1.53μs	0.652ns	2.44ns	0.0147	1.1 KB

Benchmarks.Trace.GraphQLBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`ExecuteAsync`	net472	2.39μs	5.36ns	20ns	0.2	1.26 KB
master	`ExecuteAsync`	netcoreapp3.1	1.63μs	2.21ns	8.28ns	0.0164	1.22 KB
#3049	`ExecuteAsync`	net472	2.47μs	4.45ns	17.3ns	0.2	1.26 KB
#3049	`ExecuteAsync`	netcoreapp3.1	1.58μs	1.05ns	3.79ns	0.0159	1.22 KB

Benchmarks.Trace.HttpClientBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`SendAsync`	net472	4.92μs	5.04ns	19.5ns	0.392	2.48 KB
master	`SendAsync`	netcoreapp3.1	3.22μs	2.52ns	9.43ns	0.032	2.36 KB
#3049	`SendAsync`	net472	5.05μs	2.33ns	9.02ns	0.392	2.48 KB
#3049	`SendAsync`	netcoreapp3.1	3.23μs	2.78ns	10.8ns	0.0321	2.36 KB

Benchmarks.Trace.ILoggerBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`EnrichedLog`	net472	2.73μs	0.783ns	2.82ns	0.263	1.66 KB
master	`EnrichedLog`	netcoreapp3.1	2.36μs	0.97ns	3.76ns	0.0238	1.73 KB
#3049	`EnrichedLog`	net472	2.88μs	1.27ns	4.91ns	0.263	1.66 KB
#3049	`EnrichedLog`	netcoreapp3.1	2.37μs	0.876ns	3.28ns	0.0237	1.73 KB

Benchmarks.Trace.Log4netBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Allocated
master	`EnrichedLog`	net472	146μs	114ns	440ns	0.659	0.22	4.5 KB
master	`EnrichedLog`	netcoreapp3.1	112μs	144ns	557ns	0.0562	0	4.38 KB
#3049	`EnrichedLog`	net472	145μs	64.3ns	241ns	0.656	0.219	4.5 KB
#3049	`EnrichedLog`	netcoreapp3.1	112μs	79.8ns	309ns	0.0565	0	4.38 KB

Benchmarks.Trace.NLogBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Allocated
master	`EnrichedLog`	net472	5.42μs	5.29ns	19.8ns	0.543	0.0027	3.43 KB
master	`EnrichedLog`	netcoreapp3.1	4.39μs	2.26ns	8.77ns	0.0525	0	3.8 KB
#3049	`EnrichedLog`	net472	5.24μs	3.71ns	14.4ns	0.545	0.00262	3.43 KB
#3049	`EnrichedLog`	netcoreapp3.1	4.37μs	2.34ns	8.74ns	0.0523	0	3.8 KB

Benchmarks.Trace.RedisBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`SendReceive`	net472	2.02μs	1.61ns	6.25ns	0.194	1.22 KB
master	`SendReceive`	netcoreapp3.1	1.72μs	0.607ns	2.27ns	0.0163	1.21 KB
#3049	`SendReceive`	net472	2μs	0.85ns	3.18ns	0.193	1.22 KB
#3049	`SendReceive`	netcoreapp3.1	1.69μs	0.611ns	2.2ns	0.0162	1.21 KB

Benchmarks.Trace.SerilogBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`EnrichedLog`	net472	4.63μs	1.88ns	7.05ns	0.328	2.08 KB
master	`EnrichedLog`	netcoreapp3.1	4.22μs	1.37ns	5.32ns	0.0232	1.69 KB
#3049	`EnrichedLog`	net472	4.65μs	1.3ns	5.03ns	0.33	2.08 KB
#3049	`EnrichedLog`	netcoreapp3.1	4.15μs	1.43ns	5.56ns	0.0228	1.69 KB

Benchmarks.Trace.SpanBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`StartFinishSpan`	net472	849ns	0.263ns	0.985ns	0.105	658 B
master	`StartFinishSpan`	netcoreapp3.1	762ns	0.226ns	0.875ns	0.00877	648 B
master	`StartFinishScope`	net472	1.09μs	0.281ns	1.05ns	0.117	738 B
master	`StartFinishScope`	netcoreapp3.1	888ns	0.66ns	2.47ns	0.0101	768 B
#3049	`StartFinishSpan`	net472	859ns	0.137ns	0.493ns	0.104	658 B
#3049	`StartFinishSpan`	netcoreapp3.1	765ns	0.25ns	0.969ns	0.00856	648 B
#3049	`StartFinishScope`	net472	1.06μs	0.61ns	2.36ns	0.117	738 B
#3049	`StartFinishScope`	netcoreapp3.1	909ns	0.416ns	1.56ns	0.0103	768 B

Benchmarks.Trace.TraceAnnotationsBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`RunOnMethodBegin`	net472	1.19μs	0.319ns	1.23ns	0.117	738 B
master	`RunOnMethodBegin`	netcoreapp3.1	1.01μs	0.491ns	1.84ns	0.0101	768 B
#3049	`RunOnMethodBegin`	net472	1.24μs	0.272ns	0.979ns	0.117	738 B
#3049	`RunOnMethodBegin`	netcoreapp3.1	1.03μs	1.38ns	5.15ns	0.0105	768 B

Aug 03 '22 23:08 andrewlock

Since there were a lot of changes in StatsFeature and the DiscoveryService, I created a new PR to implement the feature: #3152

Sep 01 '22 22:09 zacharycmontoya

dd-trace-dotnet dd-trace-dotnet copied to clipboard

[Tracer] Stats Computation: Disable stats computation and/or dropping P0 traces when the agent is not compatible

Summary of changes

Reason for change

Implementation details

StatsAggregator initialization now executes Endpoint Discovery

Small changes

Test coverage

Other details

Benchmarks Report :snail:

Benchmark details

Slower :warning: in #3049

More allocations :warning: in #3049

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

dd-trace-dotnet
dd-trace-dotnet copied to clipboard