opentelemetry-rust
opentelemetry-rust copied to clipboard
perf: Suppress telemetry using ContextFlags(usize) instead of bool
Changes
This PR aligns the Context struct and changes a bool into a flag field. It tries to mitigate the performance impact of #2821 on context attach/detach operations.
Merge requirement checklist
- [x] CONTRIBUTING guidelines followed
- [ ] Unit tests added/updated (if applicable)
- [ ] Appropriate
CHANGELOG.mdfiles updated for non-trivial, user-facing changes - [ ] Changes in public API reviewed (if applicable)
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 80.6%. Comparing base (353bbb0) to head (55cbad0).
Additional details and impacted files
@@ Coverage Diff @@
## main #2861 +/- ##
=====================================
Coverage 80.6% 80.6%
=====================================
Files 126 126
Lines 22195 22201 +6
=====================================
+ Hits 17898 17904 +6
Misses 4297 4297
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
👷 build bot output from this run:
| name | baseDuration | changesDuration | difference |
|---|---|---|---|
| context_attach/nested_cx/empty_cx | 31.7±0.14ns | 19.1±0.27ns | -40 |
| context_attach/nested_cx/single_value_cx | 33.7±0.22ns | 19.4±0.44ns | -42 |
| context_attach/nested_cx/span_cx | 33.0±0.29ns | 19.4±0.30ns | -41 |
| context_attach/out_of_order_cx_drop/empty_cx | 38.7±0.18ns | 19.8±0.19ns | -49 |
| context_attach/out_of_order_cx_drop/single_value_cx | 40.1±0.13ns | 20.8±0.48ns | -48 |
| context_attach/out_of_order_cx_drop/span_cx | 39.7±0.22ns | 20.8±0.32ns | -48 |
| context_attach/single_cx/empty_cx | 18.1±0.22ns | 12.5±0.31ns | -31 |
| context_attach/single_cx/single_value_cx | 17.6±0.08ns | 12.5±0.13ns | -29 |
| context_attach/single_cx/span_cx | 17.3±0.13ns | 12.5±0.25ns | -28 |
~~These are the benchmark numbers from this run:~~
These numbers are not relevant since the code has changed completely.
This is still a draft until #2870 has been merged and all benchmarks are run properly.
New performance numbers 2025-05-06 from this run:
| name | baseDuration | changesDuration | difference |
|---|---|---|---|
context/has_active_span/in-cx/alt |
8.4±0.04ns |
8.4±0.07ns |
0.0 |
context/has_active_span/in-cx/spec |
5.0±0.12ns |
5.0±0.12ns |
0.0 |
context/has_active_span/no-cx/alt |
8.4±0.02ns |
8.4±0.03ns |
0.0 |
context/has_active_span/no-cx/spec |
5.0±0.23ns |
4.7±0.14ns |
-6.5 |
context/has_active_span/no-sdk/alt |
8.4±0.05ns |
8.4±0.03ns |
0.0 |
context/has_active_span/no-sdk/spec |
5.0±0.31ns |
4.7±0.34ns |
-6.5 |
context/is_recording/in-cx/alt |
4.7±0.14ns |
4.7±0.14ns |
0.0 |
context/is_recording/in-cx/spec |
7.5±0.28ns |
7.5±0.30ns |
0.0 |
context/is_recording/no-cx/alt |
4.7±0.14ns |
4.7±0.19ns |
0.0 |
context/is_recording/no-cx/spec |
7.2±0.26ns |
7.2±0.14ns |
0.0 |
context/is_recording/no-sdk/alt |
4.7±0.14ns |
4.7±0.10ns |
0.0 |
context/is_recording/no-sdk/spec |
7.2±0.30ns |
7.2±0.25ns |
0.0 |
context/is_sampled/in-cx/alt |
8.7±0.04ns |
8.7±0.04ns |
0.0 |
context/is_sampled/in-cx/spec |
5.4±0.13ns |
5.3±0.16ns |
-0.99 |
context/is_sampled/no-cx/alt |
8.7±0.04ns |
8.7±0.28ns |
0.0 |
context/is_sampled/no-cx/spec |
5.0±0.29ns |
5.0±0.37ns |
0.0 |
context/is_sampled/no-sdk/alt |
8.7±0.06ns |
8.7±0.02ns |
0.0 |
context/is_sampled/no-sdk/spec |
5.0±0.18ns |
5.0±0.24ns |
0.0 |
context_attach/nested_cx/empty_cx |
47.4±1.14ns |
39.3±2.27ns |
-17 |
context_attach/nested_cx/single_value_cx |
48.4±1.10ns |
42.8±1.22ns |
-12 |
context_attach/nested_cx/span_cx |
48.3±0.68ns |
42.6±1.47ns |
-12 |
context_attach/out_of_order_cx_drop/empty_cx |
41.6±0.49ns |
40.7±1.16ns |
-2.0 |
context_attach/out_of_order_cx_drop/single_value_cx |
42.5±3.01ns |
42.3±2.05ns |
0.0 |
context_attach/out_of_order_cx_drop/span_cx |
42.7±1.23ns |
42.1±1.82ns |
-0.99 |
context_attach/single_cx/empty_cx |
24.1±0.43ns |
19.5±0.67ns |
-19 |
context_attach/single_cx/single_value_cx |
23.9±2.92ns |
23.4±0.60ns |
-2.0 |
context_attach/single_cx/span_cx |
23.1±0.58ns |
23.1±0.63ns |
0.0 |
telemetry_suppression/enter_telemetry_suppressed_scope |
27.2±0.67ns |
25.2±0.27ns |
-7.4 |
telemetry_suppression/is_current_telemetry_suppressed_false |
1.4±0.02ns |
1.3±0.05ns |
-5.7 |
telemetry_suppression/is_current_telemetry_suppressed_true |
1.4±0.02ns |
1.3±0.02ns |
-6.5 |
telemetry_suppression/normal_attach |
30.5±0.63ns |
28.2±1.04ns |
-7.4 |
@bantonsson can you run the bench in your machine and see if you are also observing the same? I am seeing regression in my laptop. There are improvements to attach ones anyway, so we should still proceed with this PR, but I am curious how much we can trust the bench results from the CI machines!
telemetry_suppression/enter_telemetry_suppressed_scope time: [10.170 ns 10.198 ns 10.224 ns] change: [+8.3229% +8.8793% +9.4188%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) low severe 3 (3.00%) low mild 1 (1.00%) high severe telemetry_suppression/normal_attach time: [11.386 ns 11.440 ns 11.495 ns] change: [+9.0850% +9.5688% +10.094%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 7 (7.00%) low mild 2 (2.00%) high mild telemetry_suppression/is_current_telemetry_suppressed_false time: [729.10 ps 731.32 ps 733.54 ps] change: [-2.4232% -1.9738% -1.5230%] (p = 0.00 < 0.05) Performance has improved. Found 20 outliers among 100 measurements (20.00%) 4 (4.00%) low severe 8 (8.00%) low mild 4 (4.00%) high mild 4 (4.00%) high severe telemetry_suppression/is_current_telemetry_suppressed_true time: [730.30 ps 732.13 ps 734.16 ps] change: [-1.7477% -1.2922% -0.7619%] (p = 0.00 < 0.05) Change within noise threshold.
@cijothomas These are the numbers from my M1 Max Laptop, where I see a slight regression in checks and large improvements in entering.
Benchmarking telemetry_suppression/enter_telemetry_suppressed_scope: Collecting 100 samples in estimated 2.0001 s (135M itelemetry_suppression/enter_telemetry_suppressed_scope
time: [14.479 ns 14.511 ns 14.549 ns]
change: [-21.316% -20.968% -20.611%] (p = 0.00 < 0.05)
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
2 (2.00%) low mild
2 (2.00%) high mild
7 (7.00%) high severe
telemetry_suppression/normal_attach
time: [15.431 ns 15.483 ns 15.550 ns]
change: [-18.631% -18.300% -17.951%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
6 (6.00%) low mild
1 (1.00%) high mild
5 (5.00%) high severe
Benchmarking telemetry_suppression/is_current_telemetry_suppressed_false: Collecting 100 samples in estimated 2.0000 s (1telemetry_suppression/is_current_telemetry_suppressed_false
time: [1.1357 ns 1.1436 ns 1.1545 ns]
change: [+2.1229% +2.7044% +3.3306%] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
Benchmarking telemetry_suppression/is_current_telemetry_suppressed_true: Collecting 100 samples in estimated 2.0000 s (1.telemetry_suppression/is_current_telemetry_suppressed_true
time: [1.1311 ns 1.1367 ns 1.1438 ns]
change: [+1.1998% +1.6296% +2.0577%] (p = 0.00 < 0.05)
Performance has regressed.
Found 23 outliers among 100 measurements (23.00%)
2 (2.00%) low severe
3 (3.00%) low mild
4 (4.00%) high mild
14 (14.00%) high severe
And these are the numbers from my AMD Ryzen 5 3600 2.2GHz box, where I see a slight improvement in checks and large improvements in entering.
Benchmarking telemetry_suppression/enter_telemetry_suppressed_scope: Collecting 100 samples in estimated 2.0001 s (85M iteratio
telemetry_suppression/enter_telemetry_suppressed_scope
time: [23.078 ns 23.081 ns 23.083 ns]
change: [-16.623% -16.588% -16.558%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
telemetry_suppression/normal_attach
time: [24.008 ns 24.024 ns 24.039 ns]
change: [-17.823% -17.631% -17.412%] (p = 0.00 < 0.05)
Performance has improved.
Benchmarking telemetry_suppression/is_current_telemetry_suppressed_false: Collecting 100 samples in estimated 2.0000 s (2.8B it
telemetry_suppression/is_current_telemetry_suppressed_false
time: [715.65 ps 715.73 ps 715.82 ps]
change: [-1.7774% -1.7447% -1.7127%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
3 (3.00%) high mild
3 (3.00%) high severe
Benchmarking telemetry_suppression/is_current_telemetry_suppressed_true: Collecting 100 samples in estimated 2.0000 s (2.8B ite
telemetry_suppression/is_current_telemetry_suppressed_true
time: [715.54 ps 715.63 ps 715.73 ps]
change: [-3.6408% -3.0627% -2.5393%] (p = 0.00 < 0.05)
Performance has improved.
The benchmark numbers comment have been updated with the latest run.
It would be great if there could be some extra manual testing of this @cijothomas @lalitb so we can come to a conclusion.
telemetry_suppression/enter_telemetry_suppressed_scope
time: [10.502 ns 10.524 ns 10.546 ns]
change: [+12.347% +12.742% +13.119%] (p = 0.00 < 0.05)
Performance has regressed.
telemetry_suppression/normal_attach
time: [11.106 ns 11.133 ns 11.166 ns]
change: [+8.4063% +8.8465% +9.3006%] (p = 0.00 < 0.05)
Performance has regressed.
telemetry_suppression/is_current_telemetry_suppressed_false
time: [743.58 ps 744.81 ps 746.44 ps]
change: [-0.6749% -0.3983% -0.1260%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 21 outliers among 100 measurements (21.00%)
8 (8.00%) low severe
3 (3.00%) low mild
3 (3.00%) high mild
7 (7.00%) high severe
telemetry_suppression/is_current_telemetry_suppressed_true
time: [743.35 ps 744.08 ps 745.10 ps]
change: [-0.4459% -0.1688% +0.0707%] (p = 0.22 > 0.05)
No change in performance detected.
Found 20 outliers among 100 measurements (20.00%)
3 (3.00%) low severe
6 (6.00%) low mild
5 (5.00%) high mild
6 (6.00%) high severe
I'm seeing consistent ~10% regression for enter, no-change for checks.
That is interesting @cijothomas. What hardware are you running on?
That is interesting @cijothomas. What hardware are you running on?
This was in m4 pro. I can try in a windows box and update back.
@lalitb @utpilla could you also help run the benches for this to see if there is consistent improvement/regression?
PR branch:
$ cargo bench --bench context_suppression
Finished `bench` profile [optimized + debuginfo] target(s) in 0.13s
Running benches/context_suppression.rs (/tmp/tbd_a/opentelemetry-rust/target/release/deps/context_suppression-3de0ed5c3fd35dc6)
Gnuplot not found, using plotters backend
telemetry_suppression/enter_telemetry_suppressed_scope
time: [26.002 ns 26.039 ns 26.080 ns]
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) high mild
3 (3.00%) high severe
telemetry_suppression/normal_attach
time: [26.186 ns 26.201 ns 26.218 ns]
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) low severe
2 (2.00%) low mild
3 (3.00%) high mild
5 (5.00%) high severe
telemetry_suppression/is_current_telemetry_suppressed_false
time: [1.2984 ns 1.3000 ns 1.3016 ns]
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low mild
2 (2.00%) high mild
3 (3.00%) high severe
telemetry_suppression/is_current_telemetry_suppressed_true
time: [1.2981 ns 1.3005 ns 1.3030 ns]
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low mild
5 (5.00%) high mild
2 (2.00%) high severe
Machine Conf:
$ lscpu | grep -i core
Model name: AMD EPYC 7763 64-Core Processor
Thread(s) per core: 2
Core(s) per socket: 8
$ cat /proc/meminfo | grep -E '^MemTotal'
MemTotal: 65794236 kB
$ uname -a
Linux CPC-labha-5U0JP 6.6.36.3-microsoft-standard-WSL2 #1 SMP PREEMPT_DYNAMIC Sat Jun 29 07:01:04 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
I can't seem to get consistently good benchmark numbers across the different architectures, so I'll close this work for now.
Just reopening for another small test.