opentelemetry-rust perf: Suppress telemetry using ContextFlags(usize) instead of bool

Changes

This PR aligns the Context struct and changes a bool into a flag field. It tries to mitigate the performance impact of #2821 on context attach/detach operations.

Merge requirement checklist

[x] CONTRIBUTING guidelines followed
[ ] Unit tests added/updated (if applicable)
[ ] Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
[ ] Changes in public API reviewed (if applicable)

Mar 25 '25 14:03 bantonsson

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 80.6%. Comparing base (353bbb0) to head (55cbad0).

Additional details and impacted files

@@          Coverage Diff          @@
##            main   #2861   +/-   ##
=====================================
  Coverage   80.6%   80.6%           
=====================================
  Files        126     126           
  Lines      22195   22201    +6     
=====================================
+ Hits       17898   17904    +6     
  Misses      4297    4297

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Mar 25 '25 14:03 codecov[bot]

👷 build bot output from this run:

name	baseDuration	changesDuration	difference
context_attach/nested_cx/empty_cx	31.7±0.14ns	19.1±0.27ns	-40
context_attach/nested_cx/single_value_cx	33.7±0.22ns	19.4±0.44ns	-42
context_attach/nested_cx/span_cx	33.0±0.29ns	19.4±0.30ns	-41
context_attach/out_of_order_cx_drop/empty_cx	38.7±0.18ns	19.8±0.19ns	-49
context_attach/out_of_order_cx_drop/single_value_cx	40.1±0.13ns	20.8±0.48ns	-48
context_attach/out_of_order_cx_drop/span_cx	39.7±0.22ns	20.8±0.32ns	-48
context_attach/single_cx/empty_cx	18.1±0.22ns	12.5±0.31ns	-31
context_attach/single_cx/single_value_cx	17.6±0.08ns	12.5±0.13ns	-29
context_attach/single_cx/span_cx	17.3±0.13ns	12.5±0.25ns	-28

Mar 26 '25 12:03 scottgerring

~~These are the benchmark numbers from this run:~~

These numbers are not relevant since the code has changed completely.

Mar 27 '25 10:03 bantonsson

This is still a draft until #2870 has been merged and all benchmarks are run properly.

Mar 28 '25 14:03 bantonsson

New performance numbers 2025-05-06 from this run:

name	baseDuration	changesDuration	difference
`context/has_active_span/in-cx/alt`	`8.4±0.04ns`	`8.4±0.07ns`	`0.0`
`context/has_active_span/in-cx/spec`	`5.0±0.12ns`	`5.0±0.12ns`	`0.0`
`context/has_active_span/no-cx/alt`	`8.4±0.02ns`	`8.4±0.03ns`	`0.0`
`context/has_active_span/no-cx/spec`	`5.0±0.23ns`	`4.7±0.14ns`	`-6.5`
`context/has_active_span/no-sdk/alt`	`8.4±0.05ns`	`8.4±0.03ns`	`0.0`
`context/has_active_span/no-sdk/spec`	`5.0±0.31ns`	`4.7±0.34ns`	`-6.5`
`context/is_recording/in-cx/alt`	`4.7±0.14ns`	`4.7±0.14ns`	`0.0`
`context/is_recording/in-cx/spec`	`7.5±0.28ns`	`7.5±0.30ns`	`0.0`
`context/is_recording/no-cx/alt`	`4.7±0.14ns`	`4.7±0.19ns`	`0.0`
`context/is_recording/no-cx/spec`	`7.2±0.26ns`	`7.2±0.14ns`	`0.0`
`context/is_recording/no-sdk/alt`	`4.7±0.14ns`	`4.7±0.10ns`	`0.0`
`context/is_recording/no-sdk/spec`	`7.2±0.30ns`	`7.2±0.25ns`	`0.0`
`context/is_sampled/in-cx/alt`	`8.7±0.04ns`	`8.7±0.04ns`	`0.0`
`context/is_sampled/in-cx/spec`	`5.4±0.13ns`	`5.3±0.16ns`	`-0.99`
`context/is_sampled/no-cx/alt`	`8.7±0.04ns`	`8.7±0.28ns`	`0.0`
`context/is_sampled/no-cx/spec`	`5.0±0.29ns`	`5.0±0.37ns`	`0.0`
`context/is_sampled/no-sdk/alt`	`8.7±0.06ns`	`8.7±0.02ns`	`0.0`
`context/is_sampled/no-sdk/spec`	`5.0±0.18ns`	`5.0±0.24ns`	`0.0`
`context_attach/nested_cx/empty_cx`	`47.4±1.14ns`	`39.3±2.27ns`	`-17`
`context_attach/nested_cx/single_value_cx`	`48.4±1.10ns`	`42.8±1.22ns`	`-12`
`context_attach/nested_cx/span_cx`	`48.3±0.68ns`	`42.6±1.47ns`	`-12`
`context_attach/out_of_order_cx_drop/empty_cx`	`41.6±0.49ns`	`40.7±1.16ns`	`-2.0`
`context_attach/out_of_order_cx_drop/single_value_cx`	`42.5±3.01ns`	`42.3±2.05ns`	`0.0`
`context_attach/out_of_order_cx_drop/span_cx`	`42.7±1.23ns`	`42.1±1.82ns`	`-0.99`
`context_attach/single_cx/empty_cx`	`24.1±0.43ns`	`19.5±0.67ns`	`-19`
`context_attach/single_cx/single_value_cx`	`23.9±2.92ns`	`23.4±0.60ns`	`-2.0`
`context_attach/single_cx/span_cx`	`23.1±0.58ns`	`23.1±0.63ns`	`0.0`
`telemetry_suppression/enter_telemetry_suppressed_scope`	`27.2±0.67ns`	`25.2±0.27ns`	`-7.4`
`telemetry_suppression/is_current_telemetry_suppressed_false`	`1.4±0.02ns`	`1.3±0.05ns`	`-5.7`
`telemetry_suppression/is_current_telemetry_suppressed_true`	`1.4±0.02ns`	`1.3±0.02ns`	`-6.5`
`telemetry_suppression/normal_attach`	`30.5±0.63ns`	`28.2±1.04ns`	`-7.4`

Apr 04 '25 07:04 bantonsson

@bantonsson can you run the bench in your machine and see if you are also observing the same? I am seeing regression in my laptop. There are improvements to attach ones anyway, so we should still proceed with this PR, but I am curious how much we can trust the bench results from the CI machines!

telemetry_suppression/enter_telemetry_suppressed_scope time: [10.170 ns 10.198 ns 10.224 ns] change: [+8.3229% +8.8793% +9.4188%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) low severe 3 (3.00%) low mild 1 (1.00%) high severe telemetry_suppression/normal_attach time: [11.386 ns 11.440 ns 11.495 ns] change: [+9.0850% +9.5688% +10.094%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 7 (7.00%) low mild 2 (2.00%) high mild telemetry_suppression/is_current_telemetry_suppressed_false time: [729.10 ps 731.32 ps 733.54 ps] change: [-2.4232% -1.9738% -1.5230%] (p = 0.00 < 0.05) Performance has improved. Found 20 outliers among 100 measurements (20.00%) 4 (4.00%) low severe 8 (8.00%) low mild 4 (4.00%) high mild 4 (4.00%) high severe telemetry_suppression/is_current_telemetry_suppressed_true time: [730.30 ps 732.13 ps 734.16 ps] change: [-1.7477% -1.2922% -0.7619%] (p = 0.00 < 0.05) Change within noise threshold.

Apr 04 '25 15:04 cijothomas

@cijothomas These are the numbers from my M1 Max Laptop, where I see a slight regression in checks and large improvements in entering.

Benchmarking telemetry_suppression/enter_telemetry_suppressed_scope: Collecting 100 samples in estimated 2.0001 s (135M itelemetry_suppression/enter_telemetry_suppressed_scope
                        time:   [14.479 ns 14.511 ns 14.549 ns]
                        change: [-21.316% -20.968% -20.611%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  7 (7.00%) high severe
telemetry_suppression/normal_attach
                        time:   [15.431 ns 15.483 ns 15.550 ns]
                        change: [-18.631% -18.300% -17.951%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  6 (6.00%) low mild
  1 (1.00%) high mild
  5 (5.00%) high severe
Benchmarking telemetry_suppression/is_current_telemetry_suppressed_false: Collecting 100 samples in estimated 2.0000 s (1telemetry_suppression/is_current_telemetry_suppressed_false
                        time:   [1.1357 ns 1.1436 ns 1.1545 ns]
                        change: [+2.1229% +2.7044% +3.3306%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
Benchmarking telemetry_suppression/is_current_telemetry_suppressed_true: Collecting 100 samples in estimated 2.0000 s (1.telemetry_suppression/is_current_telemetry_suppressed_true
                        time:   [1.1311 ns 1.1367 ns 1.1438 ns]
                        change: [+1.1998% +1.6296% +2.0577%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 23 outliers among 100 measurements (23.00%)
  2 (2.00%) low severe
  3 (3.00%) low mild
  4 (4.00%) high mild
  14 (14.00%) high severe

And these are the numbers from my AMD Ryzen 5 3600 2.2GHz box, where I see a slight improvement in checks and large improvements in entering.

Benchmarking telemetry_suppression/enter_telemetry_suppressed_scope: Collecting 100 samples in estimated 2.0001 s (85M iteratio
telemetry_suppression/enter_telemetry_suppressed_scope
                        time:   [23.078 ns 23.081 ns 23.083 ns]
                        change: [-16.623% -16.588% -16.558%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
telemetry_suppression/normal_attach
                        time:   [24.008 ns 24.024 ns 24.039 ns]
                        change: [-17.823% -17.631% -17.412%] (p = 0.00 < 0.05)
                        Performance has improved.
Benchmarking telemetry_suppression/is_current_telemetry_suppressed_false: Collecting 100 samples in estimated 2.0000 s (2.8B it
telemetry_suppression/is_current_telemetry_suppressed_false
                        time:   [715.65 ps 715.73 ps 715.82 ps]
                        change: [-1.7774% -1.7447% -1.7127%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe
Benchmarking telemetry_suppression/is_current_telemetry_suppressed_true: Collecting 100 samples in estimated 2.0000 s (2.8B ite
telemetry_suppression/is_current_telemetry_suppressed_true
                        time:   [715.54 ps 715.63 ps 715.73 ps]
                        change: [-3.6408% -3.0627% -2.5393%] (p = 0.00 < 0.05)
                        Performance has improved.

Apr 07 '25 07:04 bantonsson

The benchmark numbers comment have been updated with the latest run.

May 06 '25 16:05 bantonsson

It would be great if there could be some extra manual testing of this @cijothomas @lalitb so we can come to a conclusion.

May 14 '25 12:05 bantonsson

telemetry_suppression/enter_telemetry_suppressed_scope
                        time:   [10.502 ns 10.524 ns 10.546 ns]
                        change: [+12.347% +12.742% +13.119%] (p = 0.00 < 0.05)
                        Performance has regressed.
telemetry_suppression/normal_attach
                        time:   [11.106 ns 11.133 ns 11.166 ns]
                        change: [+8.4063% +8.8465% +9.3006%] (p = 0.00 < 0.05)
                        Performance has regressed.
telemetry_suppression/is_current_telemetry_suppressed_false
                        time:   [743.58 ps 744.81 ps 746.44 ps]
                        change: [-0.6749% -0.3983% -0.1260%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 21 outliers among 100 measurements (21.00%)
  8 (8.00%) low severe
  3 (3.00%) low mild
  3 (3.00%) high mild
  7 (7.00%) high severe
telemetry_suppression/is_current_telemetry_suppressed_true
                        time:   [743.35 ps 744.08 ps 745.10 ps]
                        change: [-0.4459% -0.1688% +0.0707%] (p = 0.22 > 0.05)
                        No change in performance detected.
Found 20 outliers among 100 measurements (20.00%)
  3 (3.00%) low severe
  6 (6.00%) low mild
  5 (5.00%) high mild
  6 (6.00%) high severe

I'm seeing consistent ~10% regression for enter, no-change for checks.

May 14 '25 14:05 cijothomas

That is interesting @cijothomas. What hardware are you running on?

May 14 '25 15:05 bantonsson

That is interesting @cijothomas. What hardware are you running on?

This was in m4 pro. I can try in a windows box and update back.

May 14 '25 22:05 cijothomas

@lalitb @utpilla could you also help run the benches for this to see if there is consistent improvement/regression?

May 22 '25 15:05 cijothomas

PR branch:

$ cargo bench --bench context_suppression
    Finished `bench` profile [optimized + debuginfo] target(s) in 0.13s
     Running benches/context_suppression.rs (/tmp/tbd_a/opentelemetry-rust/target/release/deps/context_suppression-3de0ed5c3fd35dc6)
Gnuplot not found, using plotters backend
telemetry_suppression/enter_telemetry_suppressed_scope
                        time:   [26.002 ns 26.039 ns 26.080 ns]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe
telemetry_suppression/normal_attach
                        time:   [26.186 ns 26.201 ns 26.218 ns]
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  5 (5.00%) high severe
telemetry_suppression/is_current_telemetry_suppressed_false
                        time:   [1.2984 ns 1.3000 ns 1.3016 ns]
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe
telemetry_suppression/is_current_telemetry_suppressed_true
                        time:   [1.2981 ns 1.3005 ns 1.3030 ns]
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe

Machine Conf:

$ lscpu | grep -i core
Model name:                           AMD EPYC 7763 64-Core Processor
Thread(s) per core:                   2
Core(s) per socket:                   8

$ cat /proc/meminfo | grep -E '^MemTotal'
MemTotal:       65794236 kB

$ uname -a
Linux CPC-labha-5U0JP 6.6.36.3-microsoft-standard-WSL2 #1 SMP PREEMPT_DYNAMIC Sat Jun 29 07:01:04 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

May 22 '25 17:05 lalitb

I can't seem to get consistently good benchmark numbers across the different architectures, so I'll close this work for now.

Aug 25 '25 10:08 bantonsson

Just reopening for another small test.

Aug 26 '25 13:08 bantonsson

opentelemetry-rust opentelemetry-rust copied to clipboard

perf: Suppress telemetry using ContextFlags(usize) instead of bool

Changes

Merge requirement checklist

Codecov Report

opentelemetry-rust
opentelemetry-rust copied to clipboard