rust-protobuf icon indicating copy to clipboard operation
rust-protobuf copied to clipboard

perf: fix encode_varint{32,64}

Open morrisonlevi opened this issue 1 year ago • 2 comments

The previous code was full of jumps that aren't easily predictable. This version is less code, generates less code, and should perform better (well, on x86_64 anyway, I am not an aarch64 expert). All while using safe Rust.

You can compare assembly on godbolt. The current version of encode_varint64 is 99 lines long and has ~10 labels, the new version is 27 lines long with only 3 labels.

I haven't included benchmarking numbers because I wasn't sure what characteristics you'd be looking for. Based on the x86_64 assembly, it should definitely out-perform. Anything in particular you are looking for there?

Edit: I also simplified encoded_varint64_len while I was at it. It gets nicer assembly as a result as well but obviously matters less.

morrisonlevi avatar Jan 31 '24 03:01 morrisonlevi

Interestingly, the miri failure is in std:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=92348df8e81c457c63c78c0845bf9e3b.

fn main() {
    match 2u64.checked_pow(64) {
        Some(n) => println!("2^64: {n}"),
        None => {}
    };
}

Given this is a checked_* operation, I would not expect an overflow related issue:

   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 0.33s
     Running `/playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/cargo-miri runner target/miri/x86_64-unknown-linux-gnu/debug/playground`
error: Undefined Behavior: overflowing shift by 64 in `unchecked_shl`
    --> /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/num/mod.rs:1181:5
     |
1181 | /     uint_impl! {
1182 | |         Self = u64,
1183 | |         ActualT = u64,
1184 | |         SignedT = i64,
...    |
1198 | |         bound_condition = "",
1199 | |     }
     | |_____^ overflowing shift by 64 in `unchecked_shl`
     |
     = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
     = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
     = note: BACKTRACE:
     = note: inside `core::num::<impl u64>::checked_pow` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/num/uint_macros.rs:1382:31: 1385:18
note: inside `main`
    --> src/main.rs:2:11
     |
2    |     match 2u64.checked_pow(64) {
     |           ^^^^^^^^^^^^^^^^^^^^
     = note: this error originates in the macro `uint_impl` (in Nightly builds, run with -Z macro-backtrace for more info)

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to 1 previous error

I have not reported this upstream nor looked for existing issues yet.

morrisonlevi avatar Jan 31 '24 22:01 morrisonlevi

I opened an issue: https://github.com/rust-lang/rust/issues/120537.

morrisonlevi avatar Jan 31 '24 22:01 morrisonlevi

Merged the part about encoded_varint64_len. Can you please rebase and confirm miri is no longer an issue?

stepancheg avatar Feb 25 '24 21:02 stepancheg

I merged in master, and Miri looks good now 👍🏻

morrisonlevi avatar Feb 26 '24 16:02 morrisonlevi

OK, if this is still an issue, some benchmark is needed. Smaller machine code does not mean code is faster.

(Project is not very well maintained now, so long responses, sorry.)

stepancheg avatar Jun 16 '24 19:06 stepancheg

Re-analyzing this, I made a mistake previously: the jumps are actually quite predictable. With this in mind, I would expect this PR to be a slight regression (since it isn't unrolled), but it gave me some ideas to improve it...

...but then I realized that someone already beat me to these ideas: https://github.com/as-com/varint-simd. I think it's worth considering using their implementation on x86_64, but there's no point in doing it in this PR.

So for now, we'll just count the encoded_varint64_len as a win and not merge the rest.

morrisonlevi avatar Jun 17 '24 20:06 morrisonlevi