rustc_codegen_cranelift
rustc_codegen_cranelift copied to clipboard
Better benchmarks
Currently only compilation and execution of very simple crates is benchmarked. An example of a useful benchmark would be https://github.com/ebobby/simple-raytracer.
I tried to compile libstd with mir inlining for fairer comparison with cg_llvm, as the later uses an optimized sysroot, while cg_clif doesn't. However I hit a rustc bug: https://github.com/rust-lang/rust/issues/63802.
$ # Bench cg_llvm, cg_clif+cg_clif sysroot, cg_clif+cg_llvm sysroot
$ hyperfine --prepare "cargo clean" "cargo build" "CHANNEL=release ../cargo.sh build" 'RUSTFLAGS="-Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so -Cpanic=abort" cargo build --target x86_64-unknown-linux-gnu'
Benchmark #1: cargo build
Time (mean ± σ): 31.611 s ± 1.041 s [User: 84.973 s, System: 3.935 s]
Range (min … max): 30.514 s … 33.711 s 10 runs
Benchmark #2: CHANNEL=release ../cargo.sh build
Time (mean ± σ): 31.211 s ± 1.130 s [User: 66.759 s, System: 5.140 s]
Range (min … max): 29.462 s … 32.760 s 10 runs
Benchmark #3: RUSTFLAGS="-Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so -Cpanic=abort" cargo build --target x86_64-unknown-linux-gnu
Time (mean ± σ): 29.833 s ± 1.501 s [User: 66.105 s, System: 4.819 s]
Range (min … max): 27.988 s … 32.409 s 10 runs
Summary
'RUSTFLAGS="-Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so -Cpanic=abort" cargo build --target x86_64-unknown-linux-gnu' ran
1.05 ± 0.06 times faster than 'CHANNEL=release ../cargo.sh build'
1.06 ± 0.06 times faster than 'cargo build'
Difference between cg_clif (compiled in release mode) and cg_llvm is within noise. This is despite cg_llvm using multiple threads for optimizations unlike cg_clif and cg_clif containing a lot of sanity checks.
$ # Bench cg_llvm with single thread optimization
$ hyperfine --prepare "cargo clean" "RUSTFLAGS=-Ccodegen-units=1 cargo build"
Benchmark #1: RUSTFLAGS=-Ccodegen-units=1 cargo build
Time (mean ± σ): 35.033 s ± 1.427 s [User: 69.766 s, System: 3.784 s]
Range (min … max): 33.336 s … 38.439 s 10 runs
Keeping the incremental data gives cg_clif a huge advantage over cg_llvm though:
$ hyperfine --prepare "rm -r target/debug/deps" --warmup 1 "cargo build" "RUSTFLAGS=-Ccodegen-units=1 cargo build" "CHANNEL=release ../cargo.sh build" 'RUSTFLAGS="-Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so -Cpanic=abort" cargo build --target x86_64-unknown-linux-gnu'
Benchmark #1: cargo build
Time (mean ± σ): 28.747 s ± 1.363 s [User: 76.805 s, System: 2.840 s]
Range (min … max): 26.742 s … 30.305 s 10 runs
Benchmark #2: RUSTFLAGS=-Ccodegen-units=1 cargo build
Time (mean ± σ): 34.041 s ± 2.252 s [User: 63.653 s, System: 2.885 s]
Range (min … max): 31.641 s … 38.352 s 10 runs
Benchmark #3: CHANNEL=release ../cargo.sh build
Time (mean ± σ): 20.232 s ± 1.091 s [User: 28.533 s, System: 1.291 s]
Range (min … max): 18.719 s … 21.881 s 10 runs
Benchmark #4: RUSTFLAGS="-Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so -Cpanic=abort" cargo build --target x86_64-unknown-linux-gnu
Time (mean ± σ): 20.707 s ± 2.022 s [User: 28.811 s, System: 1.252 s]
Range (min … max): 18.844 s … 25.844 s 10 runs
Summary
'CHANNEL=release ../cargo.sh build' ran
1.02 ± 0.11 times faster than 'RUSTFLAGS="-Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so -Cpanic=abort" cargo build --target x86_64-unknown-linux-gnu'
1.42 ± 0.10 times faster than 'cargo build'
1.68 ± 0.14 times faster than 'RUSTFLAGS=-Ccodegen-units=1 cargo build'
Runtime of simple-raytracer:
hyperfine ./raytracer-cg_llvm ./raytracer-cg_clif ./raytracer-cg_clif2 ./raytracer-cg_clif3 ./raytracer-cg_clif4
Benchmark #1: ./raytracer-cg_llvm
Time (mean ± σ): 9.483 s ± 0.099 s [User: 9.473 s, System: 0.006 s]
Range (min … max): 9.396 s … 9.710 s 10 runs
Benchmark #2: ./raytracer-cg_clif
Time (mean ± σ): 14.945 s ± 0.026 s [User: 14.935 s, System: 0.005 s]
Range (min … max): 14.910 s … 14.980 s 10 runs
Benchmark #3: ./raytracer-cg_clif2
Time (mean ± σ): 14.091 s ± 0.082 s [User: 14.079 s, System: 0.011 s]
Range (min … max): 13.990 s … 14.301 s 10 runs
Benchmark #4: ./raytracer-cg_clif3
Time (mean ± σ): 14.164 s ± 0.295 s [User: 14.156 s, System: 0.007 s]
Range (min … max): 13.983 s … 14.988 s 10 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark #5: ./raytracer-cg_clif4
Time (mean ± σ): 10.750 s ± 0.208 s [User: 10.744 s, System: 0.004 s]
Range (min … max): 10.621 s … 11.312 s 10 runs
Warning: The first benchmarking run for this command was significantly slower than the rest (11.312 s). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.
Summary
'./raytracer-cg_llvm' ran
1.13 ± 0.02 times faster than './raytracer-cg_clif4'
1.49 ± 0.02 times faster than './raytracer-cg_clif2'
1.49 ± 0.03 times faster than './raytracer-cg_clif3'
1.58 ± 0.02 times faster than './raytracer-cg_clif'
cg_clif is b9dc950a11509deadf2fa7bf6936184fe6113f4c cg_clif2 is 40629999bcbf230d14a7ac56d4b56a86b8fad3d8 cg_clif3 is 6127632c761b9a658b13c122ba0beb73f4542399 cg_clif4 is 1018a34662e0b8d9dfa650ed0ee1dfd84242ac37
After 15b9834d7d37d601fd77db11f8852f9ceb0804d0:
Benchmark #1: ./raytracer_cg_llvm
Time (mean ± σ): 7.477 s ± 0.156 s [User: 7.393 s, System: 0.037 s]
Range (min … max): 7.237 s … 7.853 s 20 runs
Benchmark #2: ./raytracer_cg_clif
Time (mean ± σ): 7.372 s ± 0.106 s [User: 7.305 s, System: 0.029 s]
Range (min … max): 7.240 s … 7.669 s 20 runs
Summary
'./raytracer_cg_clif' ran
1.01 ± 0.03 times faster than './raytracer_cg_llvm'
(benched on faster machine than previous comments)
There is now pretty much no difference between cg_clif and cg_llvm. 🎉
5b17cf208330511a0cd9b5f496075f150b7821a4 added simple-raytracer as benchmark.
Just tried to compile veloren using cg_clif. I was surprised by the huge gap between cg_clif and cg_llvm:
CHANNEL="release" ../cargo.sh build 1613,63s user 64,18s system 315% cpu 8:51,02 total
cargo build 3272,94s user 50,07s system 315% cpu 17:33,41 total
I haven't tried running the compiled version though as threads are not yet supported.