prost icon indicating copy to clipboard operation
prost copied to clipboard

Optimize varint encoding

Open MelonShooter opened this issue 3 years ago • 4 comments

This optimizes varint encoding by turning the loop into a for loop which causes the compiler to see that it can unroll the loop.

Bench:

varint/small/encode     time:   [61.577 ns 61.786 ns 62.016 ns]
                        change: [-18.793% -17.983% -17.024%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

varint/medium/encode    time:   [312.28 ns 313.01 ns 313.78 ns]
                        change: [-13.850% -13.002% -12.209%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

varint/large/encode     time:   [533.79 ns 535.11 ns 536.43 ns]
                        change: [-26.542% -25.921% -25.275%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  3 (3.00%) high severe

varint/mixed/encode     time:   [308.34 ns 309.11 ns 309.87 ns]
                        change: [-23.418% -22.569% -21.718%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe

MelonShooter avatar Nov 28 '22 08:11 MelonShooter

I am not seeing the same improvements on my laptop

varint/small/encode     time:   [254.51 ns 255.08 ns 255.78 ns]
                        change: [+4.9943% +5.3430% +5.6646%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
  6 (6.00%) high mild
  8 (8.00%) high severe

varint/small/decode     time:   [230.34 ns 230.96 ns 231.62 ns]
                        change: [+0.0577% +0.4059% +0.7424%] (p = 0.03 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

varint/small/encoded_len
                        time:   [57.304 ns 57.385 ns 57.477 ns]
                        change: [-0.3950% -0.1263% +0.1243%] (p = 0.35 > 0.05)
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe

varint/medium/encode    time:   [1.2231 us 1.2262 us 1.2301 us]
                        change: [+5.3350% +5.6709% +6.0007%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

varint/medium/decode    time:   [240.14 ns 240.56 ns 241.05 ns]
                        change: [-0.8120% -0.5223% -0.2217%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

varint/medium/encoded_len
                        time:   [57.344 ns 57.443 ns 57.556 ns]
                        change: [-0.3119% -0.0467% +0.2367%] (p = 0.73 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild
  1 (1.00%) high severe

varint/large/encode     time:   [2.3195 us 2.3250 us 2.3310 us]
                        change: [-1.1808% -0.8771% -0.5842%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  8 (8.00%) high mild

varint/large/decode     time:   [352.08 ns 352.89 ns 353.81 ns]
                        change: [-0.3626% -0.0670% +0.2361%] (p = 0.66 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe

varint/large/encoded_len
                        time:   [57.386 ns 57.487 ns 57.598 ns]
                        change: [-0.5971% -0.2721% +0.0337%] (p = 0.10 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild

varint/mixed/encode     time:   [1.4749 us 1.4777 us 1.4808 us]
                        change: [-2.2308% -1.9438% -1.6452%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

varint/mixed/decode     time:   [286.76 ns 287.35 ns 288.00 ns]
                        change: [-0.4404% -0.1303% +0.1851%] (p = 0.41 > 0.05)
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  11 (11.00%) high mild
  1 (1.00%) high severe

varint/mixed/encoded_len
                        time:   [57.407 ns 57.537 ns 57.689 ns]
                        change: [-0.5136% -0.1899% +0.1612%] (p = 0.29 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

LucioFranco avatar Dec 12 '22 18:12 LucioFranco

Could you explain more what system you ran those benchmarks on?

LucioFranco avatar Dec 12 '22 18:12 LucioFranco

Apologies for the late response. I ran this on an Intel i5-10300H on Ubuntu 22.04 through WSL. I don't remember what rustc version I ran the original benchmarks on, but I ran it again just now on rustc 1.71.1 and got similar results. What compiler version and CPU did you use to run those benchmarks?

varint/small/encode     time:   [68.863 ns 69.823 ns 70.857 ns]                                
                        change: [-22.839% -20.072% -16.939%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe

varint/small/decode     time:   [160.91 ns 162.97 ns 165.16 ns]                                
                        change: [-1.9325% +1.1842% +4.5143%] (p = 0.48 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

varint/small/encoded_len                                                                            
                        time:   [86.179 ns 88.462 ns 90.912 ns]
                        change: [+0.7213% +4.2934% +7.8878%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

varint/medium/encode    time:   [358.18 ns 364.79 ns 372.55 ns]                                 
                        change: [-19.784% -16.565% -13.286%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) high mild
  2 (2.00%) high severe

varint/medium/decode    time:   [237.39 ns 241.08 ns 245.41 ns]                                 
                        change: [-21.082% -15.509% -9.7628%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  9 (9.00%) high mild
  2 (2.00%) high severe

varint/medium/encoded_len                                                                            
                        time:   [82.511 ns 83.446 ns 84.489 ns]
                        change: [+0.2021% +3.5902% +7.4506%] (p = 0.05 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

varint/large/encode     time:   [636.18 ns 647.51 ns 659.99 ns]                                 
                        change: [-26.210% -23.732% -20.976%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe

varint/large/decode     time:   [357.62 ns 363.08 ns 369.10 ns]                                
                        change: [-4.2180% -0.3207% +3.2406%] (p = 0.87 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

varint/large/encoded_len                                                                            
                        time:   [81.459 ns 82.667 ns 83.991 ns]
                        change: [-4.8438% -1.2792% +2.3932%] (p = 0.49 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

varint/mixed/encode     time:   [348.02 ns 352.78 ns 358.36 ns]                                
                        change: [-26.256% -23.085% -20.160%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

varint/mixed/decode     time:   [237.17 ns 241.66 ns 246.70 ns]                                
                        change: [-4.2677% -1.0296% +2.5542%] (p = 0.58 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

varint/mixed/encoded_len                                                                            
                        time:   [83.815 ns 84.976 ns 86.126 ns]
                        change: [-5.0328% -1.2148% +2.4944%] (p = 0.53 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

MelonShooter avatar Aug 12 '23 17:08 MelonShooter

The current master branch now uses for _ in 0..10 { to tell the compiler there is a maximum of 10 iterations. Can you rerun your benchmark to see if this PR is still an improvement?

caspermeijn avatar Jul 12 '24 07:07 caspermeijn