datadog-agent Read 128 bit trace ID in End Invocation

What does this PR do?

@duncanista pointed out we also need to read the upper 64 bits of a trace ID in the End Invocation flow. This is a quick hack to add that. The right way, of course, would be using tracer.Inject and tracer.Extract.

Motivation

Describe how to test/QA your changes

Possible Drawbacks / Trade-offs

Additional Notes

Oct 23 '24 05:10 agocs

Test changes on VM

Use this command from test-infra-definitions to manually test this PR changes on a VM:

inv create-vm --pipeline-id=48492969 --os-family=ubuntu

Note: This applies to commit 78cb994a

Oct 23 '24 05:10 agent-platform-auto-pr[bot]

Regression Detector

Oct 23 '24 07:10 cit-pr-commenter[bot]

/merge

Nov 04 '24 20:11 agocs

:steam_locomotive: MergeQueue: waiting for PR to be ready

This merge request is not mergeable yet, because of pending checks/missing approvals. It will be added to the queue as soon as checks pass and/or get approvals. Note: if you pushed new commits since the last approval, you may need additional approval. You can remove it from the waiting list with /remove command.

Use /merge -c to cancel this operation!

Nov 04 '24 20:11 dd-devflow[bot]

/remove

Nov 04 '24 20:11 agocs

:steam_locomotive: Devflow: /remove

Nov 04 '24 20:11 dd-devflow[bot]

:warning: MergeQueue: This merge request was unqueued

This merge request was unqueued

If you need support, contact us on Slack #devflow!

Nov 04 '24 20:11 dd-devflow[bot]

Serverless Benchmark Results

BenchmarkStartEndInvocation comparison between 0c2f3b04b2b6a72190dcf53a36adf6c0d711aea7 and bacb5604700b2af18a68d0b265c62084b79a8c52.

tl;dr

Use these benchmarks as an insight tool during development.

Skim down the vs base column in each chart. If there is a ~, then there was no statistically significant change to the benchmark. Otherwise, ensure the estimated percent change is either negative or very small.
The last row of each chart is the geomean. Ensure this percentage is either negative or very small.

What is this benchmarking?

The BenchmarkStartEndInvocation compares the amount of time it takes to call the start-invocation and end-invocation endpoints. For universal instrumentation languages (Dotnet, Golang, Java, Ruby), this represents the majority of the duration overhead added by our tracing layer.

The benchmark is run using a large variety of lambda request payloads. In the charts below, there is one row for each event payload type.

How do I interpret these charts?

The charts below comes from benchstat. They represent the statistical change in duration (sec/op), memory overhead (B/op), and allocations (allocs/op).

The benchstat docs explain how to interpret these charts.

Before the comparison table, we see common file-level configuration. If there are benchmarks with different configuration (for example, from different packages), benchstat will print separate tables for each configuration.

The table then compares the two input files for each benchmark. It shows the median and 95% confidence interval summaries for each benchmark before and after the change, and an A/B comparison under "vs base". ... The p-value measures how likely it is that any differences were due to random chance (i.e., noise). The "~" means benchstat did not detect a statistically significant difference between the two inputs. ...

Note that "statistically significant" is not the same as "large": with enough low-noise data, even very small changes can be distinguished from noise and considered statistically significant. It is, of course, generally easier to distinguish large changes from noise.

Finally, the last row of the table shows the geometric mean of each column, giving an overall picture of how the benchmarks changed. Proportional changes in the geomean reflect proportional changes in the benchmarks. For example, given n benchmarks, if sec/op for one of them increases by a factor of 2, then the sec/op geomean will increase by a factor of ⁿ√2.

I need more help

First off, do not worry if the benchmarks are failing. They are not tests. The intention is for them to be a tool for you to use during development.

If you would like a hand interpreting the results come chat with us in #serverless-agent in the internal DataDog slack or in #serverless in the public DataDog slack. We're happy to help!

Benchmark stats

goos: linux
goarch: amd64
pkg: github.com/DataDog/datadog-agent/pkg/serverless/daemon
cpu: AMD EPYC 7763 64-Core Processor                
                                      │ baseline/benchmark.log │       current/benchmark.log        │
                                      │         sec/op         │   sec/op     vs base               │
api-gateway-appsec.json                            86.89µ ± 3%   85.73µ ± 2%       ~ (p=0.247 n=10)
api-gateway-kong-appsec.json                       68.02µ ± 9%   66.61µ ± 2%  -2.06% (p=0.029 n=10)
api-gateway-kong.json                              65.80µ ± 4%   66.27µ ± 3%       ~ (p=0.529 n=10)
api-gateway-non-proxy-async.json                   103.6µ ± 2%   105.2µ ± 1%  +1.52% (p=0.011 n=10)
api-gateway-non-proxy.json                         105.3µ ± 1%   105.3µ ± 1%       ~ (p=0.971 n=10)
api-gateway-websocket-connect.json                 69.04µ ± 1%   70.32µ ± 1%  +1.85% (p=0.002 n=10)
api-gateway-websocket-default.json                 62.12µ ± 1%   63.66µ ± 1%  +2.48% (p=0.000 n=10)
api-gateway-websocket-disconnect.json              62.11µ ± 1%   64.10µ ± 2%  +3.20% (p=0.000 n=10)
api-gateway.json                                   112.5µ ± 3%   115.2µ ± 1%       ~ (p=0.052 n=10)
application-load-balancer.json                     62.81µ ± 1%   65.09µ ± 1%  +3.62% (p=0.000 n=10)
cloudfront.json                                    47.40µ ± 3%   48.84µ ± 2%  +3.05% (p=0.001 n=10)
cloudwatch-events.json                             37.82µ ± 3%   39.87µ ± 1%  +5.42% (p=0.000 n=10)
cloudwatch-logs.json                               65.06µ ± 2%   67.35µ ± 1%  +3.52% (p=0.000 n=10)
custom.json                                        30.42µ ± 2%   32.20µ ± 2%  +5.84% (p=0.000 n=10)
dynamodb.json                                      91.71µ ± 2%   93.29µ ± 1%  +1.72% (p=0.007 n=10)
empty.json                                         28.81µ ± 3%   30.33µ ± 2%  +5.26% (p=0.000 n=10)
eventbridge-custom.json                            46.88µ ± 2%   47.82µ ± 5%  +2.02% (p=0.002 n=10)
eventbridge-no-bus.json                            45.94µ ± 2%   47.49µ ± 2%  +3.38% (p=0.001 n=10)
eventbridge-no-timestamp.json                      46.12µ ± 1%   47.27µ ± 1%  +2.50% (p=0.000 n=10)
eventbridgesns.json                                62.69µ ± 2%   65.20µ ± 2%  +4.01% (p=0.000 n=10)
eventbridgesqs.json                                71.96µ ± 3%   72.92µ ± 1%       ~ (p=0.165 n=10)
http-api.json                                      73.93µ ± 3%   73.74µ ± 2%       ~ (p=0.190 n=10)
kinesis-batch.json                                 72.32µ ± 3%   71.48µ ± 2%       ~ (p=0.315 n=10)
kinesis.json                                       55.63µ ± 2%   54.83µ ± 2%  -1.43% (p=0.043 n=10)
s3.json                                            61.13µ ± 2%   60.91µ ± 1%       ~ (p=0.739 n=10)
sns-batch.json                                     91.54µ ± 2%   92.69µ ± 1%       ~ (p=0.105 n=10)
sns.json                                           68.12µ ± 2%   68.92µ ± 2%       ~ (p=0.165 n=10)
snssqs.json                                        119.2µ ± 2%   119.9µ ± 2%       ~ (p=0.247 n=10)
snssqs_no_dd_context.json                          107.5µ ± 2%   107.9µ ± 2%       ~ (p=0.631 n=10)
sqs-aws-header.json                                60.68µ ± 2%   59.70µ ± 2%       ~ (p=0.063 n=10)
sqs-batch.json                                     98.98µ ± 2%   96.31µ ± 2%  -2.69% (p=0.003 n=10)
sqs.json                                           73.02µ ± 4%   72.77µ ± 2%       ~ (p=0.579 n=10)
sqs_no_dd_context.json                             69.56µ ± 2%   68.85µ ± 3%       ~ (p=0.315 n=10)
stepfunction.json                                  47.78µ ± 3%   49.19µ ± 4%       ~ (p=0.143 n=10)
geomean                                            65.94µ        66.86µ       +1.39%

                                      │ baseline/benchmark.log │        current/benchmark.log        │
                                      │          B/op          │     B/op      vs base               │
api-gateway-appsec.json                           37.34Ki ± 0%   37.35Ki ± 0%  +0.04% (p=0.022 n=10)
api-gateway-kong-appsec.json                      26.93Ki ± 0%   26.95Ki ± 0%  +0.06% (p=0.006 n=10)
api-gateway-kong.json                             24.43Ki ± 0%   24.45Ki ± 0%  +0.08% (p=0.000 n=10)
api-gateway-non-proxy-async.json                  48.13Ki ± 0%   48.13Ki ± 0%       ~ (p=1.000 n=10)
api-gateway-non-proxy.json                        47.35Ki ± 0%   47.35Ki ± 0%       ~ (p=0.542 n=10)
api-gateway-websocket-connect.json                25.53Ki ± 0%   25.54Ki ± 0%       ~ (p=0.565 n=10)
api-gateway-websocket-default.json                21.44Ki ± 0%   21.44Ki ± 0%       ~ (p=0.158 n=10)
api-gateway-websocket-disconnect.json             21.22Ki ± 0%   21.22Ki ± 0%       ~ (p=0.127 n=10)
api-gateway.json                                  49.59Ki ± 0%   49.60Ki ± 0%       ~ (p=0.780 n=10)
application-load-balancer.json                    23.31Ki ± 0%   23.32Ki ± 0%       ~ (p=0.323 n=10)
cloudfront.json                                   17.67Ki ± 0%   17.69Ki ± 0%  +0.11% (p=0.006 n=10)
cloudwatch-events.json                            11.73Ki ± 0%   11.76Ki ± 0%  +0.23% (p=0.000 n=10)
cloudwatch-logs.json                              53.38Ki ± 0%   53.39Ki ± 0%  +0.02% (p=0.022 n=10)
custom.json                                       9.746Ki ± 0%   9.775Ki ± 0%  +0.30% (p=0.000 n=10)
dynamodb.json                                     40.81Ki ± 0%   40.82Ki ± 0%       ~ (p=0.075 n=10)
empty.json                                        9.294Ki ± 0%   9.339Ki ± 0%  +0.49% (p=0.000 n=10)
eventbridge-custom.json                           15.01Ki ± 0%   15.02Ki ± 0%       ~ (p=0.838 n=10)
eventbridge-no-bus.json                           13.99Ki ± 0%   14.00Ki ± 0%       ~ (p=0.197 n=10)
eventbridge-no-timestamp.json                     14.04Ki ± 0%   14.03Ki ± 0%       ~ (p=1.000 n=10)
eventbridgesns.json                               20.92Ki ± 0%   21.00Ki ± 0%  +0.40% (p=0.023 n=10)
eventbridgesqs.json                               25.17Ki ± 0%   25.17Ki ± 0%       ~ (p=0.516 n=10)
http-api.json                                     23.93Ki ± 0%   23.94Ki ± 0%       ~ (p=0.956 n=10)
kinesis-batch.json                                27.16Ki ± 0%   27.15Ki ± 0%       ~ (p=0.516 n=10)
kinesis.json                                      17.91Ki ± 0%   17.92Ki ± 0%       ~ (p=0.469 n=10)
s3.json                                           20.49Ki ± 0%   20.50Ki ± 1%       ~ (p=0.796 n=10)
sns-batch.json                                    39.87Ki ± 0%   39.99Ki ± 0%       ~ (p=0.143 n=10)
sns.json                                          25.15Ki ± 1%   25.19Ki ± 0%       ~ (p=0.218 n=10)
snssqs.json                                       53.87Ki ± 0%   53.94Ki ± 0%       ~ (p=0.353 n=10)
snssqs_no_dd_context.json                         47.68Ki ± 0%   47.64Ki ± 0%       ~ (p=0.160 n=10)
sqs-aws-header.json                               19.44Ki ± 1%   19.41Ki ± 1%       ~ (p=0.955 n=10)
sqs-batch.json                                    42.35Ki ± 1%   42.25Ki ± 0%       ~ (p=0.225 n=10)
sqs.json                                          26.30Ki ± 1%   26.25Ki ± 1%       ~ (p=0.280 n=10)
sqs_no_dd_context.json                            21.84Ki ± 1%   21.88Ki ± 1%       ~ (p=0.853 n=10)
stepfunction.json                                 14.29Ki ± 1%   14.29Ki ± 1%       ~ (p=0.543 n=10)
geomean                                           24.61Ki        24.62Ki       +0.07%

                                      │ baseline/benchmark.log │        current/benchmark.log        │
                                      │       allocs/op        │ allocs/op   vs base                 │
api-gateway-appsec.json                             629.0 ± 0%   630.5 ± 0%  +0.24% (p=0.001 n=10)
api-gateway-kong-appsec.json                        488.0 ± 0%   489.0 ± 0%  +0.20% (p=0.000 n=10)
api-gateway-kong.json                               466.0 ± 0%   467.0 ± 0%  +0.21% (p=0.000 n=10)
api-gateway-non-proxy-async.json                    725.0 ± 0%   725.0 ± 0%       ~ (p=1.000 n=10)
api-gateway-non-proxy.json                          716.0 ± 0%   716.0 ± 0%       ~ (p=1.000 n=10)
api-gateway-websocket-connect.json                  453.0 ± 0%   453.0 ± 0%       ~ (p=1.000 n=10) ¹
api-gateway-websocket-default.json                  379.0 ± 0%   379.0 ± 0%       ~ (p=1.000 n=10)
api-gateway-websocket-disconnect.json               370.0 ± 0%   370.0 ± 0%       ~ (p=0.582 n=10)
api-gateway.json                                    791.0 ± 0%   791.0 ± 0%       ~ (p=1.000 n=10)
application-load-balancer.json                      353.0 ± 0%   353.0 ± 0%       ~ (p=1.000 n=10) ¹
cloudfront.json                                     284.0 ± 0%   285.0 ± 0%  +0.35% (p=0.000 n=10)
cloudwatch-events.json                              220.0 ± 0%   221.0 ± 0%  +0.45% (p=0.000 n=10)
cloudwatch-logs.json                                215.0 ± 0%   216.0 ± 0%  +0.47% (p=0.002 n=10)
custom.json                                         168.0 ± 0%   169.0 ± 1%  +0.60% (p=0.000 n=10)
dynamodb.json                                       589.0 ± 0%   590.0 ± 0%  +0.17% (p=0.001 n=10)
empty.json                                          159.0 ± 1%   161.0 ± 0%  +1.26% (p=0.000 n=10)
eventbridge-custom.json                             266.0 ± 0%   266.0 ± 0%       ~ (p=0.303 n=10)
eventbridge-no-bus.json                             257.0 ± 0%   257.5 ± 0%       ~ (p=0.450 n=10)
eventbridge-no-timestamp.json                       258.0 ± 0%   258.0 ± 0%       ~ (p=1.000 n=10)
eventbridgesns.json                                 325.0 ± 0%   326.0 ± 0%       ~ (p=0.082 n=10)
eventbridgesqs.json                                 367.0 ± 0%   367.0 ± 0%       ~ (p=0.966 n=10)
http-api.json                                       434.0 ± 0%   434.0 ± 0%       ~ (p=0.746 n=10)
kinesis-batch.json                                  392.0 ± 0%   393.0 ± 0%       ~ (p=0.082 n=10)
kinesis.json                                        286.0 ± 0%   287.0 ± 0%  +0.35% (p=0.003 n=10)
s3.json                                             359.0 ± 1%   360.0 ± 1%       ~ (p=0.337 n=10)
sns-batch.json                                      477.5 ± 1%   479.5 ± 0%       ~ (p=0.123 n=10)
sns.json                                            346.5 ± 1%   347.0 ± 1%       ~ (p=0.287 n=10)
snssqs.json                                         478.0 ± 1%   479.0 ± 0%       ~ (p=0.288 n=10)
snssqs_no_dd_context.json                           438.0 ± 1%   438.0 ± 0%       ~ (p=0.673 n=10)
sqs-aws-header.json                                 286.0 ± 1%   285.5 ± 1%       ~ (p=0.922 n=10)
sqs-batch.json                                      517.0 ± 1%   516.0 ± 1%       ~ (p=0.320 n=10)
sqs.json                                            365.5 ± 1%   364.0 ± 1%       ~ (p=0.314 n=10)
sqs_no_dd_context.json                              348.5 ± 1%   350.0 ± 1%       ~ (p=0.534 n=10)
stepfunction.json                                   237.5 ± 1%   237.0 ± 0%       ~ (p=0.408 n=10)
geomean                                             367.0        367.6       +0.16%
¹ all samples are equal

Nov 07 '24 19:11 github-actions[bot]

/merge

Nov 07 '24 20:11 agocs

Devflow running: `/merge`

View all feedbacks in Devflow UI.

2024-11-07 20:57:42 UTC :information_source: MergeQueue: pull request added to the queue

The median merge time in main is 24m.

Nov 07 '24 20:11 dd-devflow[bot]

datadog-agent datadog-agent copied to clipboard

Read 128 bit trace ID in End Invocation

What does this PR do?

Motivation

Describe how to test/QA your changes

Possible Drawbacks / Trade-offs

Additional Notes

Test changes on VM

Regression Detector

Serverless Benchmark Results

Devflow running: /merge

datadog-agent
datadog-agent copied to clipboard

Devflow running: `/merge`