PhastFT
PhastFT copied to clipboard
Optimising `cobra_apply`
See #46
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 99.26%. Comparing base (2e67b5c) to head (4f87d2e).
Additional details and impacted files
@@ Coverage Diff @@
## main #47 +/- ##
==========================================
+ Coverage 99.16% 99.26% +0.09%
==========================================
Files 12 12
Lines 2167 2165 -2
==========================================
Hits 2149 2149
+ Misses 18 16 -2
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
On my Zen 4 CPU this is a consistent regression in the default configuration and makes little difference with -C target-cpu=native:
cargo bench --bench=bit_reversal
Running benches/bit_reversal.rs (target/release/deps/bit_reversal-7310e7572d98d06c)
cobra_apply/cobra/15 time: [53.719 µs 53.849 µs 54.000 µs]
change: [+20.898% +21.193% +21.543%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe
cobra_apply/cobra/16 time: [105.59 µs 105.76 µs 105.92 µs]
change: [+24.041% +24.295% +24.599%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe
cobra_apply/cobra/17 time: [210.84 µs 211.11 µs 211.40 µs]
change: [+24.459% +24.613% +24.777%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
cobra_apply/cobra/18 time: [417.87 µs 418.34 µs 418.83 µs]
change: [+24.279% +24.441% +24.611%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe
cobra_apply/cobra/19 time: [839.32 µs 839.56 µs 839.83 µs]
change: [+25.252% +25.328% +25.406%] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
RUSTFLAGS='-C target-cpu=native' cargo bench --bench=bit_reversal
cobra_apply/cobra/15 time: [40.607 µs 40.631 µs 40.656 µs]
change: [−1.5313% −1.4557% −1.3814%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
cobra_apply/cobra/16 time: [79.011 µs 79.043 µs 79.082 µs]
change: [+0.5886% +0.6283% +0.6723%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) low mild
2 (2.00%) high mild
4 (4.00%) high severe
cobra_apply/cobra/17 time: [158.08 µs 158.18 µs 158.29 µs]
change: [+1.4714% +1.5514% +1.6366%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) low severe
1 (1.00%) low mild
5 (5.00%) high mild
4 (4.00%) high severe
cobra_apply/cobra/18 time: [315.23 µs 315.38 µs 315.53 µs]
change: [+3.6858% +3.7494% +3.8149%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
cobra_apply/cobra/19 time: [634.41 µs 634.94 µs 635.44 µs]
change: [+3.3992% +3.4810% +3.5690%] (p = 0.00 < 0.05)
Performance has regressed.
On what hardware did you measure it?
Interesting!
CPU is AMD Ryzen™ 5 5625U with Radeon™ Graphics × 12.
That said, I didn't set the target-cpu...
Hmm. Rust version? Mine is rustc 1.91.1 (ed61e7d7e 2025-11-07)
rust 1.91.0
The +/- %s might be bit messed up because of the order I ran the benches, but the absolute numbers show a stark benefit of the LUT!
With LUT, target-cpu=native
Benchmarking cobra_apply/cobra/15: Collecting 100 samples in estimated 5.1372 s (81k i
cobra_apply/cobra/15 time: [64.174 µs 64.578 µs 65.008 µs]
change: [−21.340% −20.823% −20.346%] (p = 0.00
Without LUT, target-cpu=native
Benchmarking cobra_apply/cobra/15: Collecting 100 samples in estimated 5.2045 s (50k i
cobra_apply/cobra/15 time: [103.51 µs 103.85 µs 104.23 µs]
change: [+60.409% +61.184% +62.011%] (p = 0.00
With LUT, no RUSTFLAGS
cobra_apply/cobra/15 time: [76.037 µs 76.244 µs 76.491 µs]
change: [−28.284% −27.892% −27.445%] (p = 0.00
Without LUT, no RUSTFLAGS
Benchmarking cobra_apply/cobra/15: Collecting 100 samples in estimated 5.3685 s (50k i
cobra_apply/cobra/15 time: [106.26 µs 106.56 µs 106.90 µs]
change: [+2.3883% +2.8431% +3.2761%] (p = 0.00
Tip: to make percentages make sense, you can run
cargo bench --bench=bit_reversal -- --save-baseline=main followed by cargo bench --bench=bit_reversal -- --baseline=main and it will calculate percentages relative to the baseline saved by the first command.
Ah that's useful. Criterion is one of these tools which I severely under use. If I ever read the docs, I could probably figure out how to group the with- and without-LUT variants into a single benchmark for much easier direct comparison.
On Sat, 15 Nov 2025, 16:09 Shnatsel, @.***> wrote:
Shnatsel left a comment (QuState/PhastFT#47) https://github.com/QuState/PhastFT/pull/47#issuecomment-3536630044
Tip: to make percentages make sense, you can run
cargo bench --bench=bit_reversal -- --save-baseline=main followed by cargo bench --bench=bit_reversal -- --baseline=main and it will calculate percentages relative to the baseline saved by the first command.
— Reply to this email directly, view it on GitHub https://github.com/QuState/PhastFT/pull/47#issuecomment-3536630044, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHSVKWAJFRYCGOFFIYM4GP3345F45AVCNFSM6AAAAACMGPWXK2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKMZWGYZTAMBUGQ . You are receiving this because you authored the thread.Message ID: @.***>
I don't think there's a one-size-fits-all solution. If we want to reap these gains, we'll need to copy FFTW's design and measure the performance of various implementations at runtime, then select the fastest one.
I've looked into COBRA some more and it's highly hardware-dependent: https://github.com/QuState/PhastFT/issues/49
We really do just need to start going down the FFTW route, measure the different variants in the planner and pick the best one for the hardware we're running on.
It would be great to have your LUT-based version as one of the options.