dd-trace-java
dd-trace-java copied to clipboard
Track high watermark offsets
What Does This Do
Track high watermark offsets along with produce and commit offsets. This information can be used to determine Kafka lag of consumers. So we can now get the Kafka lag by only instrumenting the consumer service, with no instrumentation on the producer side.
Motivation
Jira ticket: [PROJ-IDENT]
Benchmarks
Startup
Parameters
| Baseline | Candidate | |
|---|---|---|
| baseline_or_candidate | baseline | candidate |
| git_branch | master | piotr-wolski/add-high-watermark |
| git_commit_date | 1705694656 | 1705724754 |
| git_commit_sha | fcb4a55f20 | c25937a4cc |
| release_version | 1.29.0-SNAPSHOT~fcb4a55f20 | 1.28.0-SNAPSHOT~c25937a4cc |
See matching parameters
| Baseline | Candidate | |
|---|---|---|
| application | insecure-bank | insecure-bank |
| ci_job_date | 1705727774 | 1705727774 |
| ci_job_id | 413836885 | 413836885 |
| ci_pipeline_id | 26879332 | 26879332 |
| cpu_model | Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz | Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz |
| module | Agent | Agent |
| parent | None | None |
| variant | iast | iast |
Summary
Found 1 performance improvements and 3 performance regressions! Performance is the same for 40 metrics, 10 unstable metrics.
| scenario | Δ mean execution_time | candidate mean execution_time | baseline mean execution_time |
|---|---|---|---|
| scenario:startup:insecure-bank:iast_TELEMETRY_OFF:AppSec | better [-7.728ms; -3.445ms] or [-13.989%; -6.236%] |
49.653ms | 55.239ms |
| scenario:startup:insecure-bank:tracing:GlobalTracer | worse [+8.638ms; +17.397ms] or [+2.919%; +5.878%] |
308.978ms | 295.961ms |
| scenario:startup:petclinic:appsec:GlobalTracer | worse [+9.887ms; +19.982ms] or [+3.341%; +6.752%] |
310.863ms | 295.928ms |
| scenario:startup:petclinic:tracing:GlobalTracer | worse [+6.462ms; +13.744ms] or [+2.173%; +4.622%] |
307.460ms | 297.357ms |
Startup time reports for petclinic
gantt
title petclinic - global startup overhead: candidate=1.28.0-SNAPSHOT~c25937a4cc, baseline=1.29.0-SNAPSHOT~fcb4a55f20
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.06 s) : 0, 1059919
Total [baseline] (9.471 s) : 0, 9470649
Agent [candidate] (1.049 s) : 0, 1049455
Total [candidate] (9.339 s) : 0, 9338601
section appsec
Agent [baseline] (1.153 s) : 0, 1152902
Total [baseline] (9.48 s) : 0, 9480437
Agent [candidate] (1.162 s) : 0, 1162253
Total [candidate] (9.503 s) : 0, 9503410
section iast
Agent [baseline] (1.195 s) : 0, 1194624
Total [baseline] (9.647 s) : 0, 9646992
Agent [candidate] (1.179 s) : 0, 1178642
Total [candidate] (9.649 s) : 0, 9649215
section profiling
Agent [baseline] (1.277 s) : 0, 1277287
Total [baseline] (9.603 s) : 0, 9603415
Agent [candidate] (1.272 s) : 0, 1272488
Total [candidate] (9.594 s) : 0, 9593981
- baseline results
| Module | Variant | Duration | Δ tracing |
|---|---|---|---|
| Agent | tracing | 1.06 s | - |
| Agent | appsec | 1.153 s | 92.983 ms (8.8%) |
| Agent | iast | 1.195 s | 134.705 ms (12.7%) |
| Agent | profiling | 1.277 s | 217.368 ms (20.5%) |
| Total | tracing | 9.471 s | - |
| Total | appsec | 9.48 s | 9.787 ms (0.1%) |
| Total | iast | 9.647 s | 176.342 ms (1.9%) |
| Total | profiling | 9.603 s | 132.765 ms (1.4%) |
- candidate results
| Module | Variant | Duration | Δ tracing |
|---|---|---|---|
| Agent | tracing | 1.049 s | - |
| Agent | appsec | 1.162 s | 112.798 ms (10.7%) |
| Agent | iast | 1.179 s | 129.187 ms (12.3%) |
| Agent | profiling | 1.272 s | 223.034 ms (21.3%) |
| Total | tracing | 9.339 s | - |
| Total | appsec | 9.503 s | 164.809 ms (1.8%) |
| Total | iast | 9.649 s | 310.614 ms (3.3%) |
| Total | profiling | 9.594 s | 255.38 ms (2.7%) |
gantt
title petclinic - break down per module: candidate=1.28.0-SNAPSHOT~c25937a4cc, baseline=1.29.0-SNAPSHOT~fcb4a55f20
dateFormat X
axisFormat %s
section tracing
BytebuddyAgent [baseline] (669.455 ms) : 0, 669455
BytebuddyAgent [candidate] (649.218 ms) : 0, 649218
GlobalTracer [baseline] (297.357 ms) : 0, 297357
GlobalTracer [candidate] (307.46 ms) : 0, 307460
AppSec [baseline] (50.736 ms) : 0, 50736
AppSec [candidate] (50.643 ms) : 0, 50643
Remote Config [baseline] (663.158 µs) : 0, 663
Remote Config [candidate] (674.114 µs) : 0, 674
Telemetry [baseline] (7.242 ms) : 0, 7242
Telemetry [candidate] (7.249 ms) : 0, 7249
section appsec
BytebuddyAgent [baseline] (666.638 ms) : 0, 666638
BytebuddyAgent [candidate] (659.107 ms) : 0, 659107
GlobalTracer [baseline] (295.928 ms) : 0, 295928
GlobalTracer [candidate] (310.863 ms) : 0, 310863
AppSec [baseline] (148.475 ms) : 0, 148475
AppSec [candidate] (149.932 ms) : 0, 149932
Remote Config [baseline] (643.743 µs) : 0, 644
Remote Config [candidate] (658.353 µs) : 0, 658
Telemetry [baseline] (6.9 ms) : 0, 6900
Telemetry [candidate] (7.009 ms) : 0, 7009
section iast
BytebuddyAgent [baseline] (788.024 ms) : 0, 788024
BytebuddyAgent [candidate] (776.13 ms) : 0, 776130
GlobalTracer [baseline] (291.033 ms) : 0, 291033
GlobalTracer [candidate] (288.0 ms) : 0, 288000
AppSec [baseline] (52.92 ms) : 0, 52920
AppSec [candidate] (49.782 ms) : 0, 49782
Remote Config [baseline] (634.999 µs) : 0, 635
Remote Config [candidate] (567.755 µs) : 0, 568
Telemetry [baseline] (7.545 ms) : 0, 7545
Telemetry [candidate] (6.512 ms) : 0, 6512
IAST [baseline] (19.486 ms) : 0, 19486
IAST [candidate] (23.129 ms) : 0, 23129
section profiling
BytebuddyAgent [baseline] (662.93 ms) : 0, 662930
BytebuddyAgent [candidate] (660.825 ms) : 0, 660825
GlobalTracer [baseline] (376.465 ms) : 0, 376465
GlobalTracer [candidate] (375.424 ms) : 0, 375424
AppSec [baseline] (51.28 ms) : 0, 51280
AppSec [candidate] (51.022 ms) : 0, 51022
Remote Config [baseline] (988.611 µs) : 0, 989
Remote Config [candidate] (1.009 ms) : 0, 1009
Telemetry [baseline] (7.288 ms) : 0, 7288
Telemetry [candidate] (7.167 ms) : 0, 7167
ProfilingAgent [baseline] (123.86 ms) : 0, 123860
ProfilingAgent [candidate] (122.801 ms) : 0, 122801
Profiling [baseline] (123.887 ms) : 0, 123887
Profiling [candidate] (122.827 ms) : 0, 122827
Startup time reports for insecure-bank
gantt
title insecure-bank - global startup overhead: candidate=1.28.0-SNAPSHOT~c25937a4cc, baseline=1.29.0-SNAPSHOT~fcb4a55f20
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.055 s) : 0, 1055285
Total [baseline] (8.742 s) : 0, 8741766
Agent [candidate] (1.057 s) : 0, 1056648
Total [candidate] (8.728 s) : 0, 8727608
section iast
Agent [baseline] (1.178 s) : 0, 1178318
Total [baseline] (9.314 s) : 0, 9314329
Agent [candidate] (1.172 s) : 0, 1172436
Total [candidate] (9.275 s) : 0, 9275058
section iast_TELEMETRY_OFF
Agent [baseline] (1.165 s) : 0, 1165245
Total [baseline] (9.239 s) : 0, 9239481
Agent [candidate] (1.17 s) : 0, 1170140
Total [candidate] (9.224 s) : 0, 9224129
- baseline results
| Module | Variant | Duration | Δ tracing |
|---|---|---|---|
| Agent | tracing | 1.055 s | - |
| Agent | iast | 1.178 s | 123.033 ms (11.7%) |
| Agent | iast_TELEMETRY_OFF | 1.165 s | 109.96 ms (10.4%) |
| Total | tracing | 8.742 s | - |
| Total | iast | 9.314 s | 572.563 ms (6.5%) |
| Total | iast_TELEMETRY_OFF | 9.239 s | 497.715 ms (5.7%) |
- candidate results
| Module | Variant | Duration | Δ tracing |
|---|---|---|---|
| Agent | tracing | 1.057 s | - |
| Agent | iast | 1.172 s | 115.788 ms (11.0%) |
| Agent | iast_TELEMETRY_OFF | 1.17 s | 113.492 ms (10.7%) |
| Total | tracing | 8.728 s | - |
| Total | iast | 9.275 s | 547.45 ms (6.3%) |
| Total | iast_TELEMETRY_OFF | 9.224 s | 496.521 ms (5.7%) |
gantt
title insecure-bank - break down per module: candidate=1.28.0-SNAPSHOT~c25937a4cc, baseline=1.29.0-SNAPSHOT~fcb4a55f20
dateFormat X
axisFormat %s
section tracing
BytebuddyAgent [baseline] (666.412 ms) : 0, 666412
BytebuddyAgent [candidate] (654.125 ms) : 0, 654125
GlobalTracer [baseline] (295.961 ms) : 0, 295961
GlobalTracer [candidate] (308.978 ms) : 0, 308978
AppSec [baseline] (50.61 ms) : 0, 50610
AppSec [candidate] (51.185 ms) : 0, 51185
Remote Config [baseline] (671.163 µs) : 0, 671
Remote Config [candidate] (676.966 µs) : 0, 677
Telemetry [baseline] (7.286 ms) : 0, 7286
Telemetry [candidate] (7.239 ms) : 0, 7239
section iast
BytebuddyAgent [baseline] (774.269 ms) : 0, 774269
BytebuddyAgent [candidate] (772.091 ms) : 0, 772091
GlobalTracer [baseline] (288.777 ms) : 0, 288777
GlobalTracer [candidate] (286.808 ms) : 0, 286808
AppSec [baseline] (53.177 ms) : 0, 53177
AppSec [candidate] (52.32 ms) : 0, 52320
IAST [baseline] (19.664 ms) : 0, 19664
IAST [candidate] (19.725 ms) : 0, 19725
Remote Config [baseline] (618.73 µs) : 0, 619
Remote Config [candidate] (568.245 µs) : 0, 568
Telemetry [baseline] (7.427 ms) : 0, 7427
Telemetry [candidate] (6.49 ms) : 0, 6490
section iast_TELEMETRY_OFF
BytebuddyAgent [baseline] (764.968 ms) : 0, 764968
BytebuddyAgent [candidate] (769.086 ms) : 0, 769086
GlobalTracer [baseline] (285.618 ms) : 0, 285618
GlobalTracer [candidate] (287.847 ms) : 0, 287847
AppSec [baseline] (55.239 ms) : 0, 55239
AppSec [candidate] (49.653 ms) : 0, 49653
IAST [baseline] (18.242 ms) : 0, 18242
IAST [candidate] (21.237 ms) : 0, 21237
Remote Config [baseline] (599.923 µs) : 0, 600
Remote Config [candidate] (1.299 ms) : 0, 1299
Telemetry [baseline] (6.349 ms) : 0, 6349
Telemetry [candidate] (6.51 ms) : 0, 6510
Load
Parameters
| Baseline | Candidate | |
|---|---|---|
| baseline_or_candidate | baseline | candidate |
| end_time | 2024-01-20T04:55:19 | 2024-01-20T05:11:57 |
| git_branch | master | piotr-wolski/add-high-watermark |
| git_commit_date | 1705694656 | 1705724754 |
| git_commit_sha | fcb4a55f20 | c25937a4cc |
| release_version | 1.29.0-SNAPSHOT~fcb4a55f20 | 1.28.0-SNAPSHOT~c25937a4cc |
| start_time | 2024-01-20T04:55:06 | 2024-01-20T05:11:44 |
See matching parameters
| Baseline | Candidate | |
|---|---|---|
| application | insecure-bank | insecure-bank |
| ci_job_date | 1705727774 | 1705727774 |
| ci_job_id | 413836885 | 413836885 |
| ci_pipeline_id | 26879332 | 26879332 |
| cpu_model | Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz | Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz |
| variant | iast | iast |
Summary
Found 0 performance improvements and 0 performance regressions! Performance is the same for 8 metrics, 14 unstable metrics.
Request duration reports for petclinic
gantt
title petclinic - request duration [CI 0.99] : candidate=1.28.0-SNAPSHOT~c25937a4cc, baseline=1.29.0-SNAPSHOT~fcb4a55f20
dateFormat X
axisFormat %s
section baseline
no_agent (1.344 ms) : 1325, 1363
. : milestone, 1344,
appsec (1.787 ms) : 1762, 1813
. : milestone, 1787,
iast (1.524 ms) : 1500, 1549
. : milestone, 1524,
profiling (1.515 ms) : 1490, 1540
. : milestone, 1515,
tracing (1.495 ms) : 1471, 1520
. : milestone, 1495,
section candidate
no_agent (1.352 ms) : 1333, 1371
. : milestone, 1352,
appsec (1.779 ms) : 1753, 1804
. : milestone, 1779,
iast (1.514 ms) : 1489, 1538
. : milestone, 1514,
profiling (1.522 ms) : 1497, 1547
. : milestone, 1522,
tracing (1.512 ms) : 1487, 1537
. : milestone, 1512,
- baseline results
| Variant | Request duration [CI 0.99] | Δ no_agent |
|---|---|---|
| no_agent | 1.344 ms [1.325 ms, 1.363 ms] | - |
| appsec | 1.787 ms [1.762 ms, 1.813 ms] | 443.634 µs (33.0%) |
| iast | 1.524 ms [1.5 ms, 1.549 ms] | 180.406 µs (13.4%) |
| profiling | 1.515 ms [1.49 ms, 1.54 ms] | 171.289 µs (12.7%) |
| tracing | 1.495 ms [1.471 ms, 1.52 ms] | 151.374 µs (11.3%) |
- candidate results
| Variant | Request duration [CI 0.99] | Δ no_agent |
|---|---|---|
| no_agent | 1.352 ms [1.333 ms, 1.371 ms] | - |
| appsec | 1.779 ms [1.753 ms, 1.804 ms] | 426.859 µs (31.6%) |
| iast | 1.514 ms [1.489 ms, 1.538 ms] | 161.881 µs (12.0%) |
| profiling | 1.522 ms [1.497 ms, 1.547 ms] | 170.339 µs (12.6%) |
| tracing | 1.512 ms [1.487 ms, 1.537 ms] | 159.932 µs (11.8%) |
Request duration reports for insecure-bank
gantt
title insecure-bank - request duration [CI 0.99] : candidate=1.28.0-SNAPSHOT~c25937a4cc, baseline=1.29.0-SNAPSHOT~fcb4a55f20
dateFormat X
axisFormat %s
section baseline
no_agent (361.449 µs) : 342, 381
. : milestone, 361,
iast (477.627 µs) : 457, 498
. : milestone, 478,
iast_FULL (548.129 µs) : 527, 569
. : milestone, 548,
iast_INACTIVE (451.63 µs) : 430, 473
. : milestone, 452,
iast_TELEMETRY_OFF (469.503 µs) : 449, 490
. : milestone, 470,
tracing (443.754 µs) : 423, 465
. : milestone, 444,
section candidate
no_agent (370.102 µs) : 350, 390
. : milestone, 370,
iast (480.12 µs) : 459, 501
. : milestone, 480,
iast_FULL (550.931 µs) : 531, 571
. : milestone, 551,
iast_INACTIVE (453.108 µs) : 431, 475
. : milestone, 453,
iast_TELEMETRY_OFF (476.512 µs) : 455, 498
. : milestone, 477,
tracing (441.924 µs) : 421, 463
. : milestone, 442,
- baseline results
| Variant | Request duration [CI 0.99] | Δ no_agent |
|---|---|---|
| no_agent | 361.449 µs [341.874 µs, 381.024 µs] | - |
| iast | 477.627 µs [457.353 µs, 497.901 µs] | 116.179 µs (32.1%) |
| iast_FULL | 548.129 µs [527.056 µs, 569.202 µs] | 186.68 µs (51.6%) |
| iast_INACTIVE | 451.63 µs [430.146 µs, 473.113 µs] | 90.181 µs (24.9%) |
| iast_TELEMETRY_OFF | 469.503 µs [449.082 µs, 489.923 µs] | 108.054 µs (29.9%) |
| tracing | 443.754 µs [422.997 µs, 464.51 µs] | 82.305 µs (22.8%) |
- candidate results
| Variant | Request duration [CI 0.99] | Δ no_agent |
|---|---|---|
| no_agent | 370.102 µs [350.323 µs, 389.881 µs] | - |
| iast | 480.12 µs [459.304 µs, 500.935 µs] | 110.017 µs (29.7%) |
| iast_FULL | 550.931 µs [530.542 µs, 571.32 µs] | 180.829 µs (48.9%) |
| iast_INACTIVE | 453.108 µs [431.291 µs, 474.925 µs] | 83.005 µs (22.4%) |
| iast_TELEMETRY_OFF | 476.512 µs [455.231 µs, 497.794 µs] | 106.41 µs (28.8%) |
| tracing | 441.924 µs [420.938 µs, 462.911 µs] | 71.822 µs (19.4%) |
Kafka / producer-benchmark
Parameters
| Baseline | Candidate | |
|---|---|---|
| baseline_or_candidate | baseline | candidate |
| git_branch | master | piotr-wolski/add-high-watermark |
| git_commit_date | 1704464857 | 1705724754 |
| git_commit_sha | 260cceba3999a7c1f7bf1ccc2c4023556dca8463 | c25937a4cc00aa1acad35023f12760da77f25cff |
See matching parameters
| Baseline | Candidate | |
|---|---|---|
| ci_job_date | 1705726286 | 1705726286 |
| ci_job_id | 413836886 | 413836886 |
| ci_pipeline_id | 26879332 | 26879332 |
| cpu_model | Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz | Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz |
| jdkVersion | 11.0.21 | 11.0.21 |
| jmhVersion | 1.36 | 1.36 |
| jvm | /usr/lib/jvm/java-11-openjdk-amd64/bin/java | /usr/lib/jvm/java-11-openjdk-amd64/bin/java |
| jvmArgs | -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/go/src/github.com/DataDog/apm-reliability/dd-trace-java/platform/src/producer-benchmark/build/tmp/jmh -Duser.country=US -Duser.language=en -Duser.variant | -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/go/src/github.com/DataDog/apm-reliability/dd-trace-java/platform/src/producer-benchmark/build/tmp/jmh -Duser.country=US -Duser.language=en -Duser.variant |
| vmName | OpenJDK 64-Bit Server VM | OpenJDK 64-Bit Server VM |
| vmVersion | 11.0.21+9-post-Ubuntu-0ubuntu122.04 | 11.0.21+9-post-Ubuntu-0ubuntu122.04 |
Summary
Found 0 performance improvements and 0 performance regressions! Performance is the same for 3 metrics, 0 unstable metrics.
See unchanged results
| scenario | Δ mean throughput |
|---|---|
| scenario:not-instrumented/KafkaProduceBenchmark.benchProduce | unsure [-44690.245op/s; -3140.556op/s] or [-2.455%; -0.173%] |
| scenario:only-tracing-dsm-disabled-benchmarks/KafkaProduceBenchmark.benchProduce | same |
| scenario:only-tracing-dsm-enabled-benchmarks/KafkaProduceBenchmark.benchProduce | same |
Kafka / consumer-benchmark
Parameters
| Baseline | Candidate | |
|---|---|---|
| baseline_or_candidate | baseline | candidate |
| git_branch | master | piotr-wolski/add-high-watermark |
| git_commit_date | 1704464857 | 1705724754 |
| git_commit_sha | 260cceba3999a7c1f7bf1ccc2c4023556dca8463 | c25937a4cc00aa1acad35023f12760da77f25cff |
See matching parameters
| Baseline | Candidate | |
|---|---|---|
| ci_job_date | 1705726325 | 1705726325 |
| ci_job_id | 413836887 | 413836887 |
| ci_pipeline_id | 26879332 | 26879332 |
| cpu_model | Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz | Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz |
| jdkVersion | 11.0.21 | 11.0.21 |
| jmhVersion | 1.36 | 1.36 |
| jvm | /usr/lib/jvm/java-11-openjdk-amd64/bin/java | /usr/lib/jvm/java-11-openjdk-amd64/bin/java |
| jvmArgs | -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/go/src/github.com/DataDog/apm-reliability/dd-trace-java/platform/src/consumer-benchmark/build/tmp/jmh -Duser.country=US -Duser.language=en -Duser.variant | -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/go/src/github.com/DataDog/apm-reliability/dd-trace-java/platform/src/consumer-benchmark/build/tmp/jmh -Duser.country=US -Duser.language=en -Duser.variant |
| vmName | OpenJDK 64-Bit Server VM | OpenJDK 64-Bit Server VM |
| vmVersion | 11.0.21+9-post-Ubuntu-0ubuntu122.04 | 11.0.21+9-post-Ubuntu-0ubuntu122.04 |
Summary
Found 0 performance improvements and 1 performance regressions! Performance is the same for 2 metrics, 0 unstable metrics.
| scenario | Δ mean throughput |
|---|---|
| scenario:only-tracing-dsm-enabled-benchmarks/KafkaConsumerBenchmark.benchConsume | worse [-19781.802op/s; -6371.470op/s] or [-6.405%; -2.063%] |
See unchanged results
| scenario | Δ mean throughput |
|---|---|
| scenario:not-instrumented/KafkaConsumerBenchmark.benchConsume | same |
| scenario:only-tracing-dsm-disabled-benchmarks/KafkaConsumerBenchmark.benchConsume | same |
But I didn't find a way to hook into a place that is updated regularly.
Did you find a place, but were not able to hook into? Or were not able to find the right place?
But I didn't find a way to hook into a place that is updated regularly.
Did you find a place, but were not able to hook into? Or were not able to find the right place?
Ah, @kr-igor suggested to use reflection to access the high watermark offset. So I did that, and added instrumentation in the same place we capture commit offsets. The benefit is that to compute lag, we need both commit offsets and high watermark offsets, and they are now captured in the same place.
Closing for now in favor of: https://github.com/DataDog/integrations-core/pull/16889