Optimize varint encoding
This optimizes varint encoding by turning the loop into a for loop which causes the compiler to see that it can unroll the loop.
Bench:
varint/small/encode time: [61.577 ns 61.786 ns 62.016 ns]
change: [-18.793% -17.983% -17.024%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) high mild
4 (4.00%) high severe
varint/medium/encode time: [312.28 ns 313.01 ns 313.78 ns]
change: [-13.850% -13.002% -12.209%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
varint/large/encode time: [533.79 ns 535.11 ns 536.43 ns]
change: [-26.542% -25.921% -25.275%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) low mild
1 (1.00%) high mild
3 (3.00%) high severe
varint/mixed/encode time: [308.34 ns 309.11 ns 309.87 ns]
change: [-23.418% -22.569% -21.718%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) high mild
4 (4.00%) high severe
I am not seeing the same improvements on my laptop
varint/small/encode time: [254.51 ns 255.08 ns 255.78 ns]
change: [+4.9943% +5.3430% +5.6646%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
6 (6.00%) high mild
8 (8.00%) high severe
varint/small/decode time: [230.34 ns 230.96 ns 231.62 ns]
change: [+0.0577% +0.4059% +0.7424%] (p = 0.03 < 0.05)
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe
varint/small/encoded_len
time: [57.304 ns 57.385 ns 57.477 ns]
change: [-0.3950% -0.1263% +0.1243%] (p = 0.35 > 0.05)
No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
4 (4.00%) high mild
5 (5.00%) high severe
varint/medium/encode time: [1.2231 us 1.2262 us 1.2301 us]
change: [+5.3350% +5.6709% +6.0007%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
varint/medium/decode time: [240.14 ns 240.56 ns 241.05 ns]
change: [-0.8120% -0.5223% -0.2217%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
varint/medium/encoded_len
time: [57.344 ns 57.443 ns 57.556 ns]
change: [-0.3119% -0.0467% +0.2367%] (p = 0.73 > 0.05)
No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
6 (6.00%) high mild
1 (1.00%) high severe
varint/large/encode time: [2.3195 us 2.3250 us 2.3310 us]
change: [-1.1808% -0.8771% -0.5842%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
8 (8.00%) high mild
varint/large/decode time: [352.08 ns 352.89 ns 353.81 ns]
change: [-0.3626% -0.0670% +0.2361%] (p = 0.66 > 0.05)
No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severe
varint/large/encoded_len
time: [57.386 ns 57.487 ns 57.598 ns]
change: [-0.5971% -0.2721% +0.0337%] (p = 0.10 > 0.05)
No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
6 (6.00%) high mild
varint/mixed/encode time: [1.4749 us 1.4777 us 1.4808 us]
change: [-2.2308% -1.9438% -1.6452%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
varint/mixed/decode time: [286.76 ns 287.35 ns 288.00 ns]
change: [-0.4404% -0.1303% +0.1851%] (p = 0.41 > 0.05)
No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
11 (11.00%) high mild
1 (1.00%) high severe
varint/mixed/encoded_len
time: [57.407 ns 57.537 ns 57.689 ns]
change: [-0.5136% -0.1899% +0.1612%] (p = 0.29 > 0.05)
No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
Could you explain more what system you ran those benchmarks on?
Apologies for the late response. I ran this on an Intel i5-10300H on Ubuntu 22.04 through WSL. I don't remember what rustc version I ran the original benchmarks on, but I ran it again just now on rustc 1.71.1 and got similar results. What compiler version and CPU did you use to run those benchmarks?
varint/small/encode time: [68.863 ns 69.823 ns 70.857 ns]
change: [-22.839% -20.072% -16.939%] (p = 0.00 < 0.05)
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
varint/small/decode time: [160.91 ns 162.97 ns 165.16 ns]
change: [-1.9325% +1.1842% +4.5143%] (p = 0.48 > 0.05)
No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
varint/small/encoded_len
time: [86.179 ns 88.462 ns 90.912 ns]
change: [+0.7213% +4.2934% +7.8878%] (p = 0.02 < 0.05)
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
varint/medium/encode time: [358.18 ns 364.79 ns 372.55 ns]
change: [-19.784% -16.565% -13.286%] (p = 0.00 < 0.05)
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
7 (7.00%) high mild
2 (2.00%) high severe
varint/medium/decode time: [237.39 ns 241.08 ns 245.41 ns]
change: [-21.082% -15.509% -9.7628%] (p = 0.00 < 0.05)
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
9 (9.00%) high mild
2 (2.00%) high severe
varint/medium/encoded_len
time: [82.511 ns 83.446 ns 84.489 ns]
change: [+0.2021% +3.5902% +7.4506%] (p = 0.05 < 0.05)
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
varint/large/encode time: [636.18 ns 647.51 ns 659.99 ns]
change: [-26.210% -23.732% -20.976%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe
varint/large/decode time: [357.62 ns 363.08 ns 369.10 ns]
change: [-4.2180% -0.3207% +3.2406%] (p = 0.87 > 0.05)
No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
varint/large/encoded_len
time: [81.459 ns 82.667 ns 83.991 ns]
change: [-4.8438% -1.2792% +2.3932%] (p = 0.49 > 0.05)
No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
5 (5.00%) high mild
2 (2.00%) high severe
varint/mixed/encode time: [348.02 ns 352.78 ns 358.36 ns]
change: [-26.256% -23.085% -20.160%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) high mild
2 (2.00%) high severe
varint/mixed/decode time: [237.17 ns 241.66 ns 246.70 ns]
change: [-4.2677% -1.0296% +2.5542%] (p = 0.58 > 0.05)
No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe
varint/mixed/encoded_len
time: [83.815 ns 84.976 ns 86.126 ns]
change: [-5.0328% -1.2148% +2.4944%] (p = 0.53 > 0.05)
No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe
The current master branch now uses for _ in 0..10 { to tell the compiler there is a maximum of 10 iterations. Can you rerun your benchmark to see if this PR is still an improvement?