Optimize frame stats collection
Checklist
- [x] I have read the Contributor Guide
- [x] I have read and agree to the Code of Conduct
- [x] I have added a description of my changes and why I'd like them included in the section below
Description of Changes
Optimize UI performance by calculating frame collection statistics dynamically. Rather than repeatedly traversing each frame, we now update the stats upon adding or removing frames. This change mitigates the significant overhead caused by constantly accessing frame data protected by RwLock.
Also, this update modifies the data containers used for storing frames to adopt binary tree structures. As a result, all frames are sorted at the time of insertion, eliminating the need for subsequent sorting during vector collection.
Moreover, instead of returning vectors, the binary tree structure enables iteration over elements in a sorted order, providing iterators rather than vectors. This functionality gives the API caller the flexibility to either clone frames or simply inspect them in an efficient manner.
Before:
After:
Awesome work! Can you perhaps share before/after of the puffin_viewer and the puffin_egui ui being used in our project? The flamegraph is great but find it hard to compare it in concrete numbers. Would like to see some before after milliseconds per call comparison over maybe 50 frames or something.
Sure! Here is the diff at about 10k frames collected over all visible frames (~130 frames)
Before:
After:
Bench suite results:
Running benches/benchmark.rs (target/release/deps/benchmark-901adb2054ed4cfe)
Gnuplot not found, using plotters backend
profile_function time: [94.230 ns 94.448 ns 94.693 ns]
change: [-2.6028% -2.2195% -1.8438%] (p = 0.00 < 0.05)
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
7 (7.00%) high mild
3 (3.00%) high severe
profile_function_data time: [94.077 ns 94.285 ns 94.511 ns]
change: [-3.3842% -3.0679% -2.7380%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
7 (7.00%) high mild
profile_scope time: [55.632 ns 55.707 ns 55.784 ns]
change: [-0.5181% -0.3122% -0.0932%] (p = 0.01 < 0.05)
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
profile_scope_data time: [55.965 ns 56.044 ns 56.124 ns]
change: [-0.4706% -0.0480% +0.2720%] (p = 0.83 > 0.05)
No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
3 (3.00%) high mild
flush_frames time: [780.93 ns 782.89 ns 784.95 ns]
change: [-1.0638% -0.4959% +0.0945%] (p = 0.10 > 0.05)
No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe
profile_function_off time: [1.0552 ns 1.0552 ns 1.0553 ns]
change: [-0.2772% -0.2522% -0.2273%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) high mild
6 (6.00%) high severe
profile_function_data_off
time: [1.2785 ns 1.2855 ns 1.2919 ns]
change: [+2.1986% +3.4223% +4.6692%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) low severe
4 (4.00%) low mild
profile_scope_off time: [1.0552 ns 1.0553 ns 1.0553 ns]
change: [-0.0085% +0.0226% +0.0564%] (p = 0.20 > 0.05)
No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) high mild
6 (6.00%) high severe
profile_scope_data_off time: [1.0552 ns 1.0553 ns 1.0553 ns]
change: [-23.214% -22.525% -21.869%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) high mild
5 (5.00%) high severe
The only relevant benchmark for the changes is flush_frames. We could expect flush_frames performance to regress due to added complexity, specifically caching stats on add_frame calls but the benches show no change in sending frames performance.
Cool, thanks for doing the bencharks and posting the stats. If the benchmark proves there is not to much overhead I'm happy to merge the code! Think its not a disaster if we make the flush frame slightly longer as at least its predictable. One can argue tho that viewing statistics is less important then profiler performance.
@andreiltd what happened here? :) Not relevant anymore after pivot?
@kondrak I will add an entry to changelog as requested by @Hoodad and it should be ready to go.
Edit: Done :heavy_check_mark: