opentelemetry-rust icon indicating copy to clipboard operation
opentelemetry-rust copied to clipboard

perf: Suppress telemetry using ContextFlags(usize) instead of bool

Open bantonsson opened this issue 8 months ago • 14 comments

Changes

This PR aligns the Context struct and changes a bool into a flag field. It tries to mitigate the performance impact of #2821 on context attach/detach operations.

Merge requirement checklist

  • [x] CONTRIBUTING guidelines followed
  • [ ] Unit tests added/updated (if applicable)
  • [ ] Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • [ ] Changes in public API reviewed (if applicable)

bantonsson avatar Mar 25 '25 14:03 bantonsson

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 80.6%. Comparing base (353bbb0) to head (55cbad0).

Additional details and impacted files
@@          Coverage Diff          @@
##            main   #2861   +/-   ##
=====================================
  Coverage   80.6%   80.6%           
=====================================
  Files        126     126           
  Lines      22195   22201    +6     
=====================================
+ Hits       17898   17904    +6     
  Misses      4297    4297           

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Mar 25 '25 14:03 codecov[bot]

👷 build bot output from this run:

name baseDuration changesDuration difference
context_attach/nested_cx/empty_cx 31.7±0.14ns 19.1±0.27ns -40
context_attach/nested_cx/single_value_cx 33.7±0.22ns 19.4±0.44ns -42
context_attach/nested_cx/span_cx 33.0±0.29ns 19.4±0.30ns -41
context_attach/out_of_order_cx_drop/empty_cx 38.7±0.18ns 19.8±0.19ns -49
context_attach/out_of_order_cx_drop/single_value_cx 40.1±0.13ns 20.8±0.48ns -48
context_attach/out_of_order_cx_drop/span_cx 39.7±0.22ns 20.8±0.32ns -48
context_attach/single_cx/empty_cx 18.1±0.22ns 12.5±0.31ns -31
context_attach/single_cx/single_value_cx 17.6±0.08ns 12.5±0.13ns -29
context_attach/single_cx/span_cx 17.3±0.13ns 12.5±0.25ns -28

scottgerring avatar Mar 26 '25 12:03 scottgerring

~~These are the benchmark numbers from this run:~~

These numbers are not relevant since the code has changed completely.

bantonsson avatar Mar 27 '25 10:03 bantonsson

This is still a draft until #2870 has been merged and all benchmarks are run properly.

bantonsson avatar Mar 28 '25 14:03 bantonsson

New performance numbers 2025-05-06 from this run:

name baseDuration changesDuration difference
context/has_active_span/in-cx/alt 8.4±0.04ns 8.4±0.07ns 0.0
context/has_active_span/in-cx/spec 5.0±0.12ns 5.0±0.12ns 0.0
context/has_active_span/no-cx/alt 8.4±0.02ns 8.4±0.03ns 0.0
context/has_active_span/no-cx/spec 5.0±0.23ns 4.7±0.14ns -6.5
context/has_active_span/no-sdk/alt 8.4±0.05ns 8.4±0.03ns 0.0
context/has_active_span/no-sdk/spec 5.0±0.31ns 4.7±0.34ns -6.5
context/is_recording/in-cx/alt 4.7±0.14ns 4.7±0.14ns 0.0
context/is_recording/in-cx/spec 7.5±0.28ns 7.5±0.30ns 0.0
context/is_recording/no-cx/alt 4.7±0.14ns 4.7±0.19ns 0.0
context/is_recording/no-cx/spec 7.2±0.26ns 7.2±0.14ns 0.0
context/is_recording/no-sdk/alt 4.7±0.14ns 4.7±0.10ns 0.0
context/is_recording/no-sdk/spec 7.2±0.30ns 7.2±0.25ns 0.0
context/is_sampled/in-cx/alt 8.7±0.04ns 8.7±0.04ns 0.0
context/is_sampled/in-cx/spec 5.4±0.13ns 5.3±0.16ns -0.99
context/is_sampled/no-cx/alt 8.7±0.04ns 8.7±0.28ns 0.0
context/is_sampled/no-cx/spec 5.0±0.29ns 5.0±0.37ns 0.0
context/is_sampled/no-sdk/alt 8.7±0.06ns 8.7±0.02ns 0.0
context/is_sampled/no-sdk/spec 5.0±0.18ns 5.0±0.24ns 0.0
context_attach/nested_cx/empty_cx 47.4±1.14ns 39.3±2.27ns -17
context_attach/nested_cx/single_value_cx 48.4±1.10ns 42.8±1.22ns -12
context_attach/nested_cx/span_cx 48.3±0.68ns 42.6±1.47ns -12
context_attach/out_of_order_cx_drop/empty_cx 41.6±0.49ns 40.7±1.16ns -2.0
context_attach/out_of_order_cx_drop/single_value_cx 42.5±3.01ns 42.3±2.05ns 0.0
context_attach/out_of_order_cx_drop/span_cx 42.7±1.23ns 42.1±1.82ns -0.99
context_attach/single_cx/empty_cx 24.1±0.43ns 19.5±0.67ns -19
context_attach/single_cx/single_value_cx 23.9±2.92ns 23.4±0.60ns -2.0
context_attach/single_cx/span_cx 23.1±0.58ns 23.1±0.63ns 0.0
telemetry_suppression/enter_telemetry_suppressed_scope 27.2±0.67ns 25.2±0.27ns -7.4
telemetry_suppression/is_current_telemetry_suppressed_false 1.4±0.02ns 1.3±0.05ns -5.7
telemetry_suppression/is_current_telemetry_suppressed_true 1.4±0.02ns 1.3±0.02ns -6.5
telemetry_suppression/normal_attach 30.5±0.63ns 28.2±1.04ns -7.4

bantonsson avatar Apr 04 '25 07:04 bantonsson

@bantonsson can you run the bench in your machine and see if you are also observing the same? I am seeing regression in my laptop. There are improvements to attach ones anyway, so we should still proceed with this PR, but I am curious how much we can trust the bench results from the CI machines!

telemetry_suppression/enter_telemetry_suppressed_scope time: [10.170 ns 10.198 ns 10.224 ns] change: [+8.3229% +8.8793% +9.4188%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) low severe 3 (3.00%) low mild 1 (1.00%) high severe telemetry_suppression/normal_attach time: [11.386 ns 11.440 ns 11.495 ns] change: [+9.0850% +9.5688% +10.094%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 7 (7.00%) low mild 2 (2.00%) high mild telemetry_suppression/is_current_telemetry_suppressed_false time: [729.10 ps 731.32 ps 733.54 ps] change: [-2.4232% -1.9738% -1.5230%] (p = 0.00 < 0.05) Performance has improved. Found 20 outliers among 100 measurements (20.00%) 4 (4.00%) low severe 8 (8.00%) low mild 4 (4.00%) high mild 4 (4.00%) high severe telemetry_suppression/is_current_telemetry_suppressed_true time: [730.30 ps 732.13 ps 734.16 ps] change: [-1.7477% -1.2922% -0.7619%] (p = 0.00 < 0.05) Change within noise threshold.

cijothomas avatar Apr 04 '25 15:04 cijothomas

@cijothomas These are the numbers from my M1 Max Laptop, where I see a slight regression in checks and large improvements in entering.

Benchmarking telemetry_suppression/enter_telemetry_suppressed_scope: Collecting 100 samples in estimated 2.0001 s (135M itelemetry_suppression/enter_telemetry_suppressed_scope
                        time:   [14.479 ns 14.511 ns 14.549 ns]
                        change: [-21.316% -20.968% -20.611%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  7 (7.00%) high severe
telemetry_suppression/normal_attach
                        time:   [15.431 ns 15.483 ns 15.550 ns]
                        change: [-18.631% -18.300% -17.951%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  6 (6.00%) low mild
  1 (1.00%) high mild
  5 (5.00%) high severe
Benchmarking telemetry_suppression/is_current_telemetry_suppressed_false: Collecting 100 samples in estimated 2.0000 s (1telemetry_suppression/is_current_telemetry_suppressed_false
                        time:   [1.1357 ns 1.1436 ns 1.1545 ns]
                        change: [+2.1229% +2.7044% +3.3306%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
Benchmarking telemetry_suppression/is_current_telemetry_suppressed_true: Collecting 100 samples in estimated 2.0000 s (1.telemetry_suppression/is_current_telemetry_suppressed_true
                        time:   [1.1311 ns 1.1367 ns 1.1438 ns]
                        change: [+1.1998% +1.6296% +2.0577%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 23 outliers among 100 measurements (23.00%)
  2 (2.00%) low severe
  3 (3.00%) low mild
  4 (4.00%) high mild
  14 (14.00%) high severe

And these are the numbers from my AMD Ryzen 5 3600 2.2GHz box, where I see a slight improvement in checks and large improvements in entering.

Benchmarking telemetry_suppression/enter_telemetry_suppressed_scope: Collecting 100 samples in estimated 2.0001 s (85M iteratio
telemetry_suppression/enter_telemetry_suppressed_scope
                        time:   [23.078 ns 23.081 ns 23.083 ns]
                        change: [-16.623% -16.588% -16.558%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
telemetry_suppression/normal_attach
                        time:   [24.008 ns 24.024 ns 24.039 ns]
                        change: [-17.823% -17.631% -17.412%] (p = 0.00 < 0.05)
                        Performance has improved.
Benchmarking telemetry_suppression/is_current_telemetry_suppressed_false: Collecting 100 samples in estimated 2.0000 s (2.8B it
telemetry_suppression/is_current_telemetry_suppressed_false
                        time:   [715.65 ps 715.73 ps 715.82 ps]
                        change: [-1.7774% -1.7447% -1.7127%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe
Benchmarking telemetry_suppression/is_current_telemetry_suppressed_true: Collecting 100 samples in estimated 2.0000 s (2.8B ite
telemetry_suppression/is_current_telemetry_suppressed_true
                        time:   [715.54 ps 715.63 ps 715.73 ps]
                        change: [-3.6408% -3.0627% -2.5393%] (p = 0.00 < 0.05)
                        Performance has improved.

bantonsson avatar Apr 07 '25 07:04 bantonsson

The benchmark numbers comment have been updated with the latest run.

bantonsson avatar May 06 '25 16:05 bantonsson

It would be great if there could be some extra manual testing of this @cijothomas @lalitb so we can come to a conclusion.

bantonsson avatar May 14 '25 12:05 bantonsson

telemetry_suppression/enter_telemetry_suppressed_scope
                        time:   [10.502 ns 10.524 ns 10.546 ns]
                        change: [+12.347% +12.742% +13.119%] (p = 0.00 < 0.05)
                        Performance has regressed.
telemetry_suppression/normal_attach
                        time:   [11.106 ns 11.133 ns 11.166 ns]
                        change: [+8.4063% +8.8465% +9.3006%] (p = 0.00 < 0.05)
                        Performance has regressed.
telemetry_suppression/is_current_telemetry_suppressed_false
                        time:   [743.58 ps 744.81 ps 746.44 ps]
                        change: [-0.6749% -0.3983% -0.1260%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 21 outliers among 100 measurements (21.00%)
  8 (8.00%) low severe
  3 (3.00%) low mild
  3 (3.00%) high mild
  7 (7.00%) high severe
telemetry_suppression/is_current_telemetry_suppressed_true
                        time:   [743.35 ps 744.08 ps 745.10 ps]
                        change: [-0.4459% -0.1688% +0.0707%] (p = 0.22 > 0.05)
                        No change in performance detected.
Found 20 outliers among 100 measurements (20.00%)
  3 (3.00%) low severe
  6 (6.00%) low mild
  5 (5.00%) high mild
  6 (6.00%) high severe

I'm seeing consistent ~10% regression for enter, no-change for checks.

cijothomas avatar May 14 '25 14:05 cijothomas

That is interesting @cijothomas. What hardware are you running on?

bantonsson avatar May 14 '25 15:05 bantonsson

That is interesting @cijothomas. What hardware are you running on?

This was in m4 pro. I can try in a windows box and update back.

cijothomas avatar May 14 '25 22:05 cijothomas

@lalitb @utpilla could you also help run the benches for this to see if there is consistent improvement/regression?

cijothomas avatar May 22 '25 15:05 cijothomas

PR branch:

$ cargo bench --bench context_suppression
    Finished `bench` profile [optimized + debuginfo] target(s) in 0.13s
     Running benches/context_suppression.rs (/tmp/tbd_a/opentelemetry-rust/target/release/deps/context_suppression-3de0ed5c3fd35dc6)
Gnuplot not found, using plotters backend
telemetry_suppression/enter_telemetry_suppressed_scope
                        time:   [26.002 ns 26.039 ns 26.080 ns]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe
telemetry_suppression/normal_attach
                        time:   [26.186 ns 26.201 ns 26.218 ns]
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  5 (5.00%) high severe
telemetry_suppression/is_current_telemetry_suppressed_false
                        time:   [1.2984 ns 1.3000 ns 1.3016 ns]
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe
telemetry_suppression/is_current_telemetry_suppressed_true
                        time:   [1.2981 ns 1.3005 ns 1.3030 ns]
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe

Machine Conf:

$ lscpu | grep -i core
Model name:                           AMD EPYC 7763 64-Core Processor
Thread(s) per core:                   2
Core(s) per socket:                   8

$ cat /proc/meminfo | grep -E '^MemTotal'
MemTotal:       65794236 kB

$ uname -a
Linux CPC-labha-5U0JP 6.6.36.3-microsoft-standard-WSL2 #1 SMP PREEMPT_DYNAMIC Sat Jun 29 07:01:04 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

lalitb avatar May 22 '25 17:05 lalitb

I can't seem to get consistently good benchmark numbers across the different architectures, so I'll close this work for now.

bantonsson avatar Aug 25 '25 10:08 bantonsson

Just reopening for another small test.

bantonsson avatar Aug 26 '25 13:08 bantonsson