Removing start method in favor of on start hook in dogstatsd server component
What does this PR do?
This PR injects the demultiplexer as a dependencies to Dogstatsdserver and remove the public Start method from the component interface
Motivation
Favor injecting the component over using a global
Describe how to test/QA your changes
- Making sure the agent is able to start with a Dogstatsd server enabled.
- By default, dogstatsd server should be enabled so just check the agent status or agent log to see if its running.
- Making sure that Dogstatsd is still able to send metrics to our backend
- Create and run this simple script to send dogstatsd metric
- Check agent status to see if dogstatsd is sending metric.
Bloop Bleep... Dogbot Here
Regression Detector Results
Run ID: 408b6e6d-456b-4fac-b1e5-69a63239e8d0 Baseline: 7fad488caacff09e4cc911c07f064f2ee33d5f3c Comparison: be9fd32f7c4590fd39e9810f86219eb1e22f6b52 Total CPUs: 7
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
No significant changes in experiment optimization goals
Confidence level: 90.00% Effect size tolerance: |Δ mean %| ≥ 5.00%
There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.
Experiments ignored for regressions
Regressions in experiments with settings containing erratic: true are ignored.
| perf | experiment | goal | Δ mean % | Δ mean % CI |
|---|---|---|---|---|
| ➖ | file_to_blackhole | % cpu utilization | +1.06 | [-5.54, +7.67] |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI |
|---|---|---|---|---|
| ➖ | uds_dogstatsd_to_api_cpu | % cpu utilization | +1.47 | [+0.05, +2.90] |
| ➖ | file_to_blackhole | % cpu utilization | +1.06 | [-5.54, +7.67] |
| ➖ | process_agent_real_time_mode | memory utilization | +0.87 | [+0.82, +0.92] |
| ➖ | idle | memory utilization | +0.78 | [+0.73, +0.82] |
| ➖ | process_agent_standard_check | memory utilization | +0.15 | [+0.10, +0.21] |
| ➖ | trace_agent_json | ingress throughput | +0.03 | [+0.00, +0.05] |
| ➖ | trace_agent_msgpack | ingress throughput | +0.02 | [+0.00, +0.03] |
| ➖ | uds_dogstatsd_to_api | ingress throughput | +0.00 | [-0.00, +0.00] |
| ➖ | tcp_dd_logs_filter_exclude | ingress throughput | +0.00 | [-0.00, +0.00] |
| ➖ | process_agent_standard_check_with_stats | memory utilization | -0.13 | [-0.18, -0.08] |
| ➖ | file_tree | memory utilization | -0.15 | [-0.26, -0.05] |
| ➖ | tcp_syslog_to_blackhole | ingress throughput | -0.36 | [-0.41, -0.30] |
| ➖ | otel_to_otel_logs | ingress throughput | -0.92 | [-1.54, -0.30] |
Explanation
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
/merge
:steam_locomotive: MergeQueue
Pull request added to the queue.
This build is going to start soon! (estimated merge in less than 49m)
Use /merge -c to cancel this operation!
:x: MergeQueue
This PR is rejected because it was updated
If you need support, contact us on Slack #ci-interfaces with those details!
/merge
:steam_locomotive: MergeQueue
This merge request is not mergeable yet, because of pending checks/missing approvals. It will be added to the queue as soon as checks pass and/or get approvals.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.
Use /merge -c to cancel this operation!
:steam_locomotive: MergeQueue
Added to the queue.
This build is going to start soon! (estimated merge in less than 48m)
Use /merge -c to cancel this operation!
:x: MergeQueue
Tests failed on this commit 5d242aa
You should fix those tests and then re-add your pull request to the queue!
Details
some checks are failing:
If you need support, contact us on Slack #ci-interfaces with those details!
/merge
:steam_locomotive: MergeQueue
Pull request added to the queue.
This build is going to start soon! (estimated merge in less than 49m)
Use /merge -c to cancel this operation!
:x: MergeQueue
Tests failed on this commit 4ad76ca
You should fix those tests and then re-add your pull request to the queue!
Details
some checks are failing:
If you need support, contact us on Slack #ci-interfaces with those details!
Serverless Benchmark Results
BenchmarkStartEndInvocation comparison between c3e5646370e539a792e4b1ccc219f69156058c91 and 0a6dc64147838ab551f00387b82e838bad2cdf2f.
tl;dr
-
Skim down the
vs basecolumn in each chart. If there is a~, then there was no statistically significant change to the benchmark. Otherwise, ensure the estimated percent change is either negative or very small. -
The last row of each chart is the
geomean. Ensure this percentage is either negative or very small.
What is this benchmarking?
The BenchmarkStartEndInvocation compares the amount of time it takes to call the start-invocation and end-invocation endpoints. For universal instrumentation languages (Dotnet, Golang, Java, Ruby), this represents the majority of the duration overhead added by our tracing layer.
The benchmark is run using a large variety of lambda request payloads. In the charts below, there is one row for each event payload type.
How do I interpret these charts?
The charts below comes from benchstat. They represent the statistical change in duration (sec/op), memory overhead (B/op), and allocations (allocs/op).
The benchstat docs explain how to interpret these charts.
Before the comparison table, we see common file-level configuration. If there are benchmarks with different configuration (for example, from different packages), benchstat will print separate tables for each configuration.
The table then compares the two input files for each benchmark. It shows the median and 95% confidence interval summaries for each benchmark before and after the change, and an A/B comparison under "vs base". ... The p-value measures how likely it is that any differences were due to random chance (i.e., noise). The "~" means benchstat did not detect a statistically significant difference between the two inputs. ...
Note that "statistically significant" is not the same as "large": with enough low-noise data, even very small changes can be distinguished from noise and considered statistically significant. It is, of course, generally easier to distinguish large changes from noise.
Finally, the last row of the table shows the geometric mean of each column, giving an overall picture of how the benchmarks changed. Proportional changes in the geomean reflect proportional changes in the benchmarks. For example, given n benchmarks, if sec/op for one of them increases by a factor of 2, then the sec/op geomean will increase by a factor of ⁿ√2.
Benchmark stats
goos: linux
goarch: amd64
pkg: github.com/DataDog/datadog-agent/pkg/serverless/daemon
cpu: AMD EPYC 7763 64-Core Processor
│ baseline/benchmark.log │ current/benchmark.log │
│ sec/op │ sec/op vs base │
api-gateway-appsec.json 85.72µ ± 3% 82.94µ ± 3% -3.24% (p=0.019 n=10)
api-gateway-kong-appsec.json 65.62µ ± 1% 65.13µ ± 2% -0.74% (p=0.035 n=10)
api-gateway-kong.json 63.06µ ± 2% 62.71µ ± 1% ~ (p=0.128 n=10)
api-gateway-non-proxy-async.json 99.93µ ± 2% 99.76µ ± 1% ~ (p=0.315 n=10)
api-gateway-non-proxy.json 100.08µ ± 2% 99.39µ ± 1% -0.69% (p=0.023 n=10)
api-gateway-websocket-connect.json 66.45µ ± 2% 66.27µ ± 1% ~ (p=0.579 n=10)
api-gateway-websocket-default.json 60.39µ ± 1% 59.57µ ± 2% ~ (p=0.105 n=10)
api-gateway-websocket-disconnect.json 60.10µ ± 1% 59.31µ ± 2% ~ (p=0.089 n=10)
api-gateway.json 111.5µ ± 1% 110.7µ ± 2% ~ (p=0.393 n=10)
application-load-balancer.json 60.16µ ± 2% 60.04µ ± 2% ~ (p=0.579 n=10)
cloudfront.json 45.83µ ± 1% 46.07µ ± 2% ~ (p=0.315 n=10)
cloudwatch-events.json 36.80µ ± 2% 37.15µ ± 2% ~ (p=0.436 n=10)
cloudwatch-logs.json 62.45µ ± 2% 63.44µ ± 2% +1.60% (p=0.011 n=10)
custom.json 28.92µ ± 1% 29.36µ ± 1% +1.51% (p=0.019 n=10)
dynamodb.json 91.96µ ± 1% 92.88µ ± 2% ~ (p=0.063 n=10)
empty.json 26.99µ ± 1% 27.75µ ± 2% +2.81% (p=0.000 n=10)
eventbridge-custom.json 40.34µ ± 2% 40.54µ ± 1% ~ (p=0.393 n=10)
http-api.json 70.43µ ± 1% 71.63µ ± 1% +1.71% (p=0.005 n=10)
kinesis-batch.json 69.57µ ± 3% 69.47µ ± 1% ~ (p=0.912 n=10)
kinesis.json 51.81µ ± 1% 52.86µ ± 2% +2.04% (p=0.015 n=10)
s3.json 57.56µ ± 2% 58.06µ ± 2% ~ (p=0.123 n=10)
sns-batch.json 89.18µ ± 1% 90.38µ ± 1% +1.35% (p=0.019 n=10)
sns.json 63.00µ ± 1% 64.03µ ± 2% +1.65% (p=0.043 n=10)
snssqs.json 103.4µ ± 1% 104.7µ ± 2% ~ (p=0.190 n=10)
snssqs_no_dd_context.json 98.19µ ± 1% 97.71µ ± 1% ~ (p=0.247 n=10)
sqs-aws-header.json 54.09µ ± 2% 54.30µ ± 1% ~ (p=0.190 n=10)
sqs-batch.json 92.87µ ± 2% 93.93µ ± 1% ~ (p=0.063 n=10)
sqs.json 66.53µ ± 4% 67.91µ ± 2% ~ (p=0.190 n=10)
sqs_no_dd_context.json 60.43µ ± 1% 60.58µ ± 2% ~ (p=0.631 n=10)
geomean 64.46µ 64.71µ +0.39%
│ baseline/benchmark.log │ current/benchmark.log │
│ B/op │ B/op vs base │
api-gateway-appsec.json 36.96Ki ± 0% 37.02Ki ± 0% +0.17% (p=0.000 n=10)
api-gateway-kong-appsec.json 26.62Ki ± 0% 26.62Ki ± 0% ~ (p=0.868 n=10)
api-gateway-kong.json 24.11Ki ± 0% 24.11Ki ± 0% ~ (p=0.341 n=10)
api-gateway-non-proxy-async.json 47.76Ki ± 0% 47.81Ki ± 0% +0.11% (p=0.000 n=10)
api-gateway-non-proxy.json 46.97Ki ± 0% 47.02Ki ± 0% +0.11% (p=0.000 n=10)
api-gateway-websocket-connect.json 25.20Ki ± 0% 25.23Ki ± 0% +0.12% (p=0.000 n=10)
api-gateway-websocket-default.json 21.10Ki ± 0% 21.13Ki ± 0% +0.16% (p=0.000 n=10)
api-gateway-websocket-disconnect.json 20.88Ki ± 0% 20.91Ki ± 0% +0.15% (p=0.000 n=10)
api-gateway.json 49.28Ki ± 0% 49.27Ki ± 0% ~ (p=0.956 n=10)
application-load-balancer.json 22.05Ki ± 0% 22.99Ki ± 0% +4.26% (p=0.000 n=10)
cloudfront.json 17.39Ki ± 0% 17.40Ki ± 0% ~ (p=0.085 n=10)
cloudwatch-events.json 11.44Ki ± 0% 11.48Ki ± 0% +0.34% (p=0.000 n=10)
cloudwatch-logs.json 53.11Ki ± 0% 53.11Ki ± 0% ~ (p=0.425 n=10)
custom.json 9.477Ki ± 0% 9.501Ki ± 0% +0.25% (p=0.015 n=10)
dynamodb.json 40.42Ki ± 0% 40.45Ki ± 0% +0.06% (p=0.002 n=10)
empty.json 9.026Ki ± 0% 9.029Ki ± 0% ~ (p=0.754 n=10)
eventbridge-custom.json 13.16Ki ± 0% 13.19Ki ± 0% +0.26% (p=0.005 n=10)
http-api.json 23.43Ki ± 0% 23.51Ki ± 0% +0.34% (p=0.000 n=10)
kinesis-batch.json 26.77Ki ± 0% 26.77Ki ± 0% ~ (p=0.343 n=10)
kinesis.json 17.56Ki ± 0% 17.56Ki ± 0% ~ (p=0.617 n=10)
s3.json 20.07Ki ± 0% 20.11Ki ± 0% +0.21% (p=0.002 n=10)
sns-batch.json 38.38Ki ± 0% 38.39Ki ± 0% ~ (p=0.516 n=10)
sns.json 23.73Ki ± 0% 23.73Ki ± 0% ~ (p=0.724 n=10)
snssqs.json 49.35Ki ± 0% 49.37Ki ± 0% +0.05% (p=0.043 n=10)
snssqs_no_dd_context.json 44.53Ki ± 0% 44.60Ki ± 0% +0.14% (p=0.011 n=10)
sqs-aws-header.json 18.59Ki ± 0% 18.61Ki ± 0% ~ (p=0.699 n=10)
sqs-batch.json 41.37Ki ± 0% 41.39Ki ± 0% ~ (p=0.280 n=10)
sqs.json 25.28Ki ± 0% 25.31Ki ± 0% ~ (p=0.247 n=10)
sqs_no_dd_context.json 20.43Ki ± 0% 20.44Ki ± 0% ~ (p=0.724 n=10)
geomean 25.39Ki 25.45Ki +0.25%
│ baseline/benchmark.log │ current/benchmark.log │
│ allocs/op │ allocs/op vs base │
api-gateway-appsec.json 628.5 ± 0% 628.0 ± 0% ~ (p=0.650 n=10)
api-gateway-kong-appsec.json 487.0 ± 0% 487.0 ± 0% ~ (p=1.000 n=10) ¹
api-gateway-kong.json 465.0 ± 0% 465.0 ± 0% ~ (p=1.000 n=10) ¹
api-gateway-non-proxy-async.json 724.0 ± 0% 724.0 ± 0% ~ (p=1.000 n=10)
api-gateway-non-proxy.json 715.0 ± 0% 715.0 ± 0% ~ (p=1.000 n=10)
api-gateway-websocket-connect.json 452.0 ± 0% 452.0 ± 0% ~ (p=1.000 n=10) ¹
api-gateway-websocket-default.json 378.0 ± 0% 378.0 ± 0% ~ (p=1.000 n=10) ¹
api-gateway-websocket-disconnect.json 368.0 ± 0% 368.0 ± 0% ~ (p=1.000 n=10)
api-gateway.json 789.0 ± 0% 789.0 ± 0% ~ (p=1.000 n=10)
application-load-balancer.json 350.0 ± 0% 351.0 ± 0% +0.29% (p=0.002 n=10)
cloudfront.json 282.0 ± 0% 282.0 ± 0% ~ (p=1.000 n=10)
cloudwatch-events.json 219.0 ± 0% 219.0 ± 0% ~ (p=1.000 n=10) ¹
cloudwatch-logs.json 214.0 ± 0% 214.0 ± 0% ~ (p=1.000 n=10)
custom.json 167.0 ± 0% 167.0 ± 0% ~ (p=1.000 n=10)
dynamodb.json 588.0 ± 0% 588.0 ± 0% ~ (p=0.628 n=10)
empty.json 158.0 ± 0% 158.0 ± 0% ~ (p=1.000 n=10)
eventbridge-custom.json 253.0 ± 0% 253.0 ± 0% ~ (p=0.628 n=10)
http-api.json 431.0 ± 0% 431.0 ± 0% ~ (p=0.365 n=10)
kinesis-batch.json 389.0 ± 0% 389.0 ± 0% ~ (p=1.000 n=10)
kinesis.json 284.0 ± 0% 283.0 ± 0% ~ (p=0.132 n=10)
s3.json 356.0 ± 0% 356.0 ± 0% ~ (p=1.000 n=10)
sns-batch.json 453.0 ± 0% 454.0 ± 0% ~ (p=0.165 n=10)
sns.json 322.0 ± 0% 322.0 ± 0% ~ (p=0.337 n=10)
snssqs.json 423.0 ± 0% 422.0 ± 0% ~ (p=0.120 n=10)
snssqs_no_dd_context.json 398.0 ± 0% 398.0 ± 0% ~ (p=0.926 n=10)
sqs-aws-header.json 272.0 ± 0% 272.5 ± 0% ~ (p=0.269 n=10)
sqs-batch.json 502.0 ± 0% 502.0 ± 0% ~ (p=0.336 n=10)
sqs.json 349.5 ± 0% 350.0 ± 0% ~ (p=0.628 n=10)
sqs_no_dd_context.json 323.0 ± 1% 323.0 ± 0% ~ (p=0.135 n=10)
geomean 374.5 374.5 +0.01%
¹ all samples are equal
/merge
:steam_locomotive: MergeQueue
Pull request added to the queue.
There are 3 builds ahead! (estimated merge in less than 2h)
Use /merge -c to cancel this operation!