edit icon indicating copy to clipboard operation
edit copied to clipboard

Optimize SIMD impl of `lines_fwd` and `lines_bwd`

Open heiher opened this issue 5 months ago • 2 comments

Benchmark results on AMD Zen4:

 simd/lines_fwd/1        time:   [1.4801 ns 1.4803 ns 1.4804 ns]
                         thrpt:  [644.19 MiB/s 644.26 MiB/s 644.34 MiB/s]
                  change:
                         time:   [−38.958% −38.933% −38.898%] (p = 0.00 < 0.05)
                         thrpt:  [+63.661% +63.755% +63.821%]
                         Performance has improved.

 simd/lines_fwd/8        time:   [3.8571 ns 3.8607 ns 3.8641 ns]
                         thrpt:  [1.9282 GiB/s 1.9299 GiB/s 1.9316 GiB/s]
                  change:
                         time:   [−19.458% −19.423% −19.394%] (p = 0.00 < 0.05)
                         thrpt:  [+24.060% +24.105% +24.159%]
                         Performance has improved.

 simd/lines_fwd/128      time:   [14.064 ns 14.092 ns 14.120 ns]
                         thrpt:  [8.4426 GiB/s 8.4592 GiB/s 8.4764 GiB/s]
                  change:
                         time:   [−6.0955% −5.8911% −5.6706%] (p = 0.00 < 0.05)
                         thrpt:  [+6.0115% +6.2599% +6.4912%]
                         Performance has improved.

 simd/lines_fwd/1024     time:   [18.160 ns 18.178 ns 18.195 ns]
                         thrpt:  [52.415 GiB/s 52.462 GiB/s 52.516 GiB/s]
                  change:
                         time:   [−4.1174% −3.9973% −3.8859%] (p = 0.00 < 0.05)
                         thrpt:  [+4.0430% +4.1637% +4.2942%]
                         Performance has improved.

 simd/lines_fwd/131072   time:   [871.09 ns 871.14 ns 871.21 ns]
                         thrpt:  [140.12 GiB/s 140.13 GiB/s 140.14 GiB/s]
                  change:
                         time:   [−10.451% −10.405% −10.358%] (p = 0.00 < 0.05)
                         thrpt:  [+11.555% +11.613% +11.670%]
                         Performance has improved.

 simd/lines_fwd/134217728
                         time:   [1.7326 ms 1.7332 ms 1.7338 ms]
                         thrpt:  [72.094 GiB/s 72.120 GiB/s 72.146 GiB/s]
                  change:
                         time:   [−0.4091% −0.2348% −0.0670%] (p = 0.00 < 0.05)
                         thrpt:  [+0.0671% +0.2353% +0.4108%]
                         Change within noise threshold.

heiher avatar Jun 26 '25 15:06 heiher

@microsoft-github-policy-service agree company="Loongson"

heiher avatar Jun 26 '25 15:06 heiher

Benchmark results on Intel i7-7700k (https://github.com/microsoft/edit/commit/e20e0061dcce6b577fcd64f42b3b07ef5d5311d1):

simd/lines_fwd/1        time:   [2.9242 ns 2.9293 ns 2.9350 ns]
                        thrpt:  [324.93 MiB/s 325.56 MiB/s 326.13 MiB/s]
                 change:
                        time:   [−39.147% −38.969% −38.816%] (p = 0.00 < 0.05)
                        thrpt:  [+63.443% +63.850% +64.331%]
                        Performance has improved.

simd/lines_fwd/8        time:   [7.7873 ns 7.7982 ns 7.8092 ns]
                        thrpt:  [976.98 MiB/s 978.35 MiB/s 979.72 MiB/s]
                 change:
                        time:   [−17.301% −17.160% −17.018%] (p = 0.00 < 0.05)
                        thrpt:  [+20.508% +20.714% +20.921%]
                        Performance has improved.

simd/lines_fwd/128      time:   [27.227 ns 27.327 ns 27.488 ns]
                        thrpt:  [4.3368 GiB/s 4.3623 GiB/s 4.3783 GiB/s]
                 change:
                        time:   [−4.5703% −4.0676% −3.4233%] (p = 0.00 < 0.05)
                        thrpt:  [+3.5446% +4.2401% +4.7892%]
                        Performance has improved.

simd/lines_fwd/1024     time:   [38.485 ns 38.561 ns 38.646 ns]
                        thrpt:  [24.677 GiB/s 24.732 GiB/s 24.780 GiB/s]
                 change:
                        time:   [−5.3727% −4.8970% −4.2502%] (p = 0.00 < 0.05)
                        thrpt:  [+4.4389% +5.1492% +5.6777%]
                        Performance has improved.

simd/lines_fwd/131072   time:   [1.8623 µs 1.8700 µs 1.8804 µs]
                        thrpt:  [64.917 GiB/s 65.279 GiB/s 65.547 GiB/s]
                 change:
                        time:   [−14.964% −14.690% −14.387%] (p = 0.00 < 0.05)
                        thrpt:  [+16.804% +17.220% +17.598%]
                        Performance has improved.

simd/lines_fwd/134217728
                        time:   [5.6005 ms 5.6123 ms 5.6264 ms]
                        thrpt:  [22.217 GiB/s 22.272 GiB/s 22.319 GiB/s]
                 change:
                        time:   [−3.4977% −3.2437% −2.9432%] (p = 0.00 < 0.05)
                        thrpt:  [+3.0324% +3.3524% +3.6244%]
                        Performance has improved.

heiher avatar Jun 26 '25 16:06 heiher