edit
edit copied to clipboard
Optimize SIMD impl of `lines_fwd` and `lines_bwd`
Benchmark results on AMD Zen4:
simd/lines_fwd/1 time: [1.4801 ns 1.4803 ns 1.4804 ns]
thrpt: [644.19 MiB/s 644.26 MiB/s 644.34 MiB/s]
change:
time: [−38.958% −38.933% −38.898%] (p = 0.00 < 0.05)
thrpt: [+63.661% +63.755% +63.821%]
Performance has improved.
simd/lines_fwd/8 time: [3.8571 ns 3.8607 ns 3.8641 ns]
thrpt: [1.9282 GiB/s 1.9299 GiB/s 1.9316 GiB/s]
change:
time: [−19.458% −19.423% −19.394%] (p = 0.00 < 0.05)
thrpt: [+24.060% +24.105% +24.159%]
Performance has improved.
simd/lines_fwd/128 time: [14.064 ns 14.092 ns 14.120 ns]
thrpt: [8.4426 GiB/s 8.4592 GiB/s 8.4764 GiB/s]
change:
time: [−6.0955% −5.8911% −5.6706%] (p = 0.00 < 0.05)
thrpt: [+6.0115% +6.2599% +6.4912%]
Performance has improved.
simd/lines_fwd/1024 time: [18.160 ns 18.178 ns 18.195 ns]
thrpt: [52.415 GiB/s 52.462 GiB/s 52.516 GiB/s]
change:
time: [−4.1174% −3.9973% −3.8859%] (p = 0.00 < 0.05)
thrpt: [+4.0430% +4.1637% +4.2942%]
Performance has improved.
simd/lines_fwd/131072 time: [871.09 ns 871.14 ns 871.21 ns]
thrpt: [140.12 GiB/s 140.13 GiB/s 140.14 GiB/s]
change:
time: [−10.451% −10.405% −10.358%] (p = 0.00 < 0.05)
thrpt: [+11.555% +11.613% +11.670%]
Performance has improved.
simd/lines_fwd/134217728
time: [1.7326 ms 1.7332 ms 1.7338 ms]
thrpt: [72.094 GiB/s 72.120 GiB/s 72.146 GiB/s]
change:
time: [−0.4091% −0.2348% −0.0670%] (p = 0.00 < 0.05)
thrpt: [+0.0671% +0.2353% +0.4108%]
Change within noise threshold.
@microsoft-github-policy-service agree company="Loongson"
Benchmark results on Intel i7-7700k (https://github.com/microsoft/edit/commit/e20e0061dcce6b577fcd64f42b3b07ef5d5311d1):
simd/lines_fwd/1 time: [2.9242 ns 2.9293 ns 2.9350 ns]
thrpt: [324.93 MiB/s 325.56 MiB/s 326.13 MiB/s]
change:
time: [−39.147% −38.969% −38.816%] (p = 0.00 < 0.05)
thrpt: [+63.443% +63.850% +64.331%]
Performance has improved.
simd/lines_fwd/8 time: [7.7873 ns 7.7982 ns 7.8092 ns]
thrpt: [976.98 MiB/s 978.35 MiB/s 979.72 MiB/s]
change:
time: [−17.301% −17.160% −17.018%] (p = 0.00 < 0.05)
thrpt: [+20.508% +20.714% +20.921%]
Performance has improved.
simd/lines_fwd/128 time: [27.227 ns 27.327 ns 27.488 ns]
thrpt: [4.3368 GiB/s 4.3623 GiB/s 4.3783 GiB/s]
change:
time: [−4.5703% −4.0676% −3.4233%] (p = 0.00 < 0.05)
thrpt: [+3.5446% +4.2401% +4.7892%]
Performance has improved.
simd/lines_fwd/1024 time: [38.485 ns 38.561 ns 38.646 ns]
thrpt: [24.677 GiB/s 24.732 GiB/s 24.780 GiB/s]
change:
time: [−5.3727% −4.8970% −4.2502%] (p = 0.00 < 0.05)
thrpt: [+4.4389% +5.1492% +5.6777%]
Performance has improved.
simd/lines_fwd/131072 time: [1.8623 µs 1.8700 µs 1.8804 µs]
thrpt: [64.917 GiB/s 65.279 GiB/s 65.547 GiB/s]
change:
time: [−14.964% −14.690% −14.387%] (p = 0.00 < 0.05)
thrpt: [+16.804% +17.220% +17.598%]
Performance has improved.
simd/lines_fwd/134217728
time: [5.6005 ms 5.6123 ms 5.6264 ms]
thrpt: [22.217 GiB/s 22.272 GiB/s 22.319 GiB/s]
change:
time: [−3.4977% −3.2437% −2.9432%] (p = 0.00 < 0.05)
thrpt: [+3.0324% +3.3524% +3.6244%]
Performance has improved.