rust-protobuf
rust-protobuf copied to clipboard
perf: fix encode_varint{32,64}
The previous code was full of jumps that aren't easily predictable. This version is less code, generates less code, and should perform better (well, on x86_64 anyway, I am not an aarch64 expert). All while using safe Rust.
You can compare assembly on godbolt. The current version of encode_varint64 is 99 lines long and has ~10 labels, the new version is 27 lines long with only 3 labels.
I haven't included benchmarking numbers because I wasn't sure what characteristics you'd be looking for. Based on the x86_64 assembly, it should definitely out-perform. Anything in particular you are looking for there?
Edit: I also simplified encoded_varint64_len while I was at it. It gets nicer assembly as a result as well but obviously matters less.
Interestingly, the miri failure is in std:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=92348df8e81c457c63c78c0845bf9e3b.
fn main() {
match 2u64.checked_pow(64) {
Some(n) => println!("2^64: {n}"),
None => {}
};
}
Given this is a checked_* operation, I would not expect an overflow related issue:
Compiling playground v0.0.1 (/playground)
Finished dev [unoptimized + debuginfo] target(s) in 0.33s
Running `/playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/cargo-miri runner target/miri/x86_64-unknown-linux-gnu/debug/playground`
error: Undefined Behavior: overflowing shift by 64 in `unchecked_shl`
--> /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/num/mod.rs:1181:5
|
1181 | / uint_impl! {
1182 | | Self = u64,
1183 | | ActualT = u64,
1184 | | SignedT = i64,
... |
1198 | | bound_condition = "",
1199 | | }
| |_____^ overflowing shift by 64 in `unchecked_shl`
|
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
= note: BACKTRACE:
= note: inside `core::num::<impl u64>::checked_pow` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/num/uint_macros.rs:1382:31: 1385:18
note: inside `main`
--> src/main.rs:2:11
|
2 | match 2u64.checked_pow(64) {
| ^^^^^^^^^^^^^^^^^^^^
= note: this error originates in the macro `uint_impl` (in Nightly builds, run with -Z macro-backtrace for more info)
note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace
error: aborting due to 1 previous error
I have not reported this upstream nor looked for existing issues yet.
I opened an issue: https://github.com/rust-lang/rust/issues/120537.
Merged the part about encoded_varint64_len. Can you please rebase and confirm miri is no longer an issue?
I merged in master, and Miri looks good now 👍🏻
OK, if this is still an issue, some benchmark is needed. Smaller machine code does not mean code is faster.
(Project is not very well maintained now, so long responses, sorry.)
Re-analyzing this, I made a mistake previously: the jumps are actually quite predictable. With this in mind, I would expect this PR to be a slight regression (since it isn't unrolled), but it gave me some ideas to improve it...
...but then I realized that someone already beat me to these ideas: https://github.com/as-com/varint-simd. I think it's worth considering using their implementation on x86_64, but there's no point in doing it in this PR.
So for now, we'll just count the encoded_varint64_len as a win and not merge the rest.