puffin Optimize frame stats collection

Checklist

[x] I have read the Contributor Guide
[x] I have read and agree to the Code of Conduct
[x] I have added a description of my changes and why I'd like them included in the section below

Description of Changes

Optimize UI performance by calculating frame collection statistics dynamically. Rather than repeatedly traversing each frame, we now update the stats upon adding or removing frames. This change mitigates the significant overhead caused by constantly accessing frame data protected by RwLock.

Also, this update modifies the data containers used for storing frames to adopt binary tree structures. As a result, all frames are sorted at the time of insertion, eliminating the need for subsequent sorting during vector collection.

Moreover, instead of returning vectors, the binary tree structure enables iteration over elements in a sorted order, providing iterators rather than vectors. This functionality gives the API caller the flexibility to either clone frames or simply inspect them in an efficient manner.

Before: before

After: after

Jun 14 '23 13:06 andreiltd

Awesome work! Can you perhaps share before/after of the puffin_viewer and the puffin_egui ui being used in our project? The flamegraph is great but find it hard to compare it in concrete numbers. Would like to see some before after milliseconds per call comparison over maybe 50 frames or something.

Jun 15 '23 17:06 TimonPost

Sure! Here is the diff at about 10k frames collected over all visible frames (~130 frames)

Before: puffin-before-at-10k-frames

After: after-after

Jun 16 '23 09:06 andreiltd

Bench suite results:

     Running benches/benchmark.rs (target/release/deps/benchmark-901adb2054ed4cfe)
Gnuplot not found, using plotters backend
profile_function        time:   [94.230 ns 94.448 ns 94.693 ns]
                        change: [-2.6028% -2.2195% -1.8438%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  7 (7.00%) high mild
  3 (3.00%) high severe

profile_function_data   time:   [94.077 ns 94.285 ns 94.511 ns]
                        change: [-3.3842% -3.0679% -2.7380%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high mild

profile_scope           time:   [55.632 ns 55.707 ns 55.784 ns]
                        change: [-0.5181% -0.3122% -0.0932%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

profile_scope_data      time:   [55.965 ns 56.044 ns 56.124 ns]
                        change: [-0.4706% -0.0480% +0.2720%] (p = 0.83 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild

flush_frames            time:   [780.93 ns 782.89 ns 784.95 ns]
                        change: [-1.0638% -0.4959% +0.0945%] (p = 0.10 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

profile_function_off    time:   [1.0552 ns 1.0552 ns 1.0553 ns]
                        change: [-0.2772% -0.2522% -0.2273%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

profile_function_data_off
                        time:   [1.2785 ns 1.2855 ns 1.2919 ns]
                        change: [+2.1986% +3.4223% +4.6692%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) low severe
  4 (4.00%) low mild

profile_scope_off       time:   [1.0552 ns 1.0553 ns 1.0553 ns]
                        change: [-0.0085% +0.0226% +0.0564%] (p = 0.20 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe

profile_scope_data_off  time:   [1.0552 ns 1.0553 ns 1.0553 ns]
                        change: [-23.214% -22.525% -21.869%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe

Jun 22 '23 06:06 andreiltd

The only relevant benchmark for the changes is flush_frames. We could expect flush_frames performance to regress due to added complexity, specifically caching stats on add_frame calls but the benches show no change in sending frames performance.

Jun 23 '23 12:06 andreiltd

Cool, thanks for doing the bencharks and posting the stats. If the benchmark proves there is not to much overhead I'm happy to merge the code! Think its not a disaster if we make the flush frame slightly longer as at least its predictable. One can argue tho that viewing statistics is less important then profiler performance.

Jun 26 '23 07:06 TimonPost

@andreiltd what happened here? :) Not relevant anymore after pivot?

Jul 08 '24 11:07 kondrak

@kondrak I will add an entry to changelog as requested by @Hoodad and it should be ready to go.

Edit: Done :heavy_check_mark:

Jul 08 '24 11:07 andreiltd