prost
prost copied to clipboard
Rewrite varint decode with macros and bitmanip
Since I was already looking around those parts of code, I've decided to rewrite decode_varint_slice() using macros, since it seemed like the more Rust-like solution to this kind of hand unrolling.
Me being me, I also changed the operations from mathematical to bit manipulation, especially since I've had a hunch it might make the decode faster.
$ cargo bench -- decode --baseline master
Finished bench [optimized + debuginfo] target(s) in 0.03s
Running unittests (target/release/deps/varint-90bd7bb67d5b06de)
varint/small/decode time: [209.73 ns 210.15 ns 210.55 ns]
change: [-1.8950% -1.6741% -1.4215%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
varint/medium/decode time: [280.86 ns 281.30 ns 281.81 ns]
change: [-5.7832% -5.6250% -5.4531%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) high mild
2 (2.00%) high severe
varint/large/decode time: [456.65 ns 457.50 ns 458.56 ns]
change: [-3.4868% -3.2768% -3.0738%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe
varint/mixed/decode time: [352.98 ns 353.96 ns 355.01 ns]
change: [-5.4912% -5.1425% -4.7645%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) high mild
4 (4.00%) high severe
That said, the benchmarks seem quite unreliable - if it were 2-3% across the board I wouldn't even mention it. So I'd love for someone to double check the performance.
CC: @ajguerrer