lance icon indicating copy to clipboard operation
lance copied to clipboard

chore: simplify dot implementation to use auto-vectorization

Open eddyxu opened this issue 1 year ago • 1 comments

This change makes the auto-vectorization version of dot(f32) as fast as manually written SIMD.

Run benchmarks via

export RUSTFLAGS="-C target-cpu=native"
git checkout main
cargo bench --bench dot -- --save-baseline dot_main f32
git checkout lei/simplify_dot
cargo bench --bench dot -- --baseline dot_main f32

On Macbook M2 Max

Dot(f32, auto-vectorization)
                        time:   [88.812 ms 89.654 ms 90.306 ms]
                        change: [-2.5819% -1.6876% -0.6964%] (p = 0.01 < 0.10)
                        Change within noise threshold.

AMD 5900X

Dot(f32, auto-vectorization)
                        time:   [172.50 ms 176.41 ms 179.41 ms]
                        change: [-2.3545% +0.6133% +3.5448%] (p = 0.69 > 0.10)
                        No change in performance detected.

Intel Sapphire

Dot(f32, auto-vectorization)
                        time:   [331.36 ms 331.62 ms 331.93 ms]
                        change: [-2.3160% -1.1226% -0.3451%] (p = 0.04 < 0.10)
                        Change within noise threshold.

Graviton3

Benchmarking Dot(f32, auto-vectorization): Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 8.8s or enable flat sampling.
Dot(f32, auto-vectorization)
                        time:   [160.62 ms 160.70 ms 160.76 ms]
                        change: [-1.1157% -0.6868% -0.2951%] (p = 0.00 < 0.10)
                        Change within noise threshold.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) low mild

eddyxu avatar Jul 26 '24 17:07 eddyxu

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 79.73%. Comparing base (3edfa50) to head (62228e5).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2645      +/-   ##
==========================================
- Coverage   79.81%   79.73%   -0.08%     
==========================================
  Files         224      224              
  Lines       65871    65827      -44     
  Branches    65871    65827      -44     
==========================================
- Hits        52572    52489      -83     
- Misses      10225    10256      +31     
- Partials     3074     3082       +8     
Flag Coverage Δ
unittests 79.73% <100.00%> (-0.08%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Jul 26 '24 19:07 codecov-commenter