rustc_codegen_cranelift icon indicating copy to clipboard operation
rustc_codegen_cranelift copied to clipboard

Better benchmarks

Open bjorn3 opened this issue 6 years ago • 7 comments

Currently only compilation and execution of very simple crates is benchmarked. An example of a useful benchmark would be https://github.com/ebobby/simple-raytracer.

bjorn3 avatar Aug 21 '19 12:08 bjorn3

I tried to compile libstd with mir inlining for fairer comparison with cg_llvm, as the later uses an optimized sysroot, while cg_clif doesn't. However I hit a rustc bug: https://github.com/rust-lang/rust/issues/63802.

bjorn3 avatar Aug 22 '19 09:08 bjorn3

$ # Bench cg_llvm, cg_clif+cg_clif sysroot, cg_clif+cg_llvm sysroot
$ hyperfine --prepare "cargo clean" "cargo build" "CHANNEL=release ../cargo.sh build" 'RUSTFLAGS="-Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so -Cpanic=abort" cargo build --target x86_64-unknown-linux-gnu'
Benchmark #1: cargo build
  Time (mean ± σ):     31.611 s ±  1.041 s    [User: 84.973 s, System: 3.935 s]
  Range (min … max):   30.514 s … 33.711 s    10 runs
 
Benchmark #2: CHANNEL=release ../cargo.sh build
  Time (mean ± σ):     31.211 s ±  1.130 s    [User: 66.759 s, System: 5.140 s]
  Range (min … max):   29.462 s … 32.760 s    10 runs
 
Benchmark #3: RUSTFLAGS="-Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so -Cpanic=abort" cargo build --target x86_64-unknown-linux-gnu
  Time (mean ± σ):     29.833 s ±  1.501 s    [User: 66.105 s, System: 4.819 s]
  Range (min … max):   27.988 s … 32.409 s    10 runs
 
Summary
  'RUSTFLAGS="-Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so -Cpanic=abort" cargo build --target x86_64-unknown-linux-gnu' ran
    1.05 ± 0.06 times faster than 'CHANNEL=release ../cargo.sh build'
    1.06 ± 0.06 times faster than 'cargo build'

Difference between cg_clif (compiled in release mode) and cg_llvm is within noise. This is despite cg_llvm using multiple threads for optimizations unlike cg_clif and cg_clif containing a lot of sanity checks.

$ # Bench cg_llvm with single thread optimization
$ hyperfine --prepare "cargo clean" "RUSTFLAGS=-Ccodegen-units=1 cargo build"
Benchmark #1: RUSTFLAGS=-Ccodegen-units=1 cargo build
  Time (mean ± σ):     35.033 s ±  1.427 s    [User: 69.766 s, System: 3.784 s]
  Range (min … max):   33.336 s … 38.439 s    10 runs

bjorn3 avatar Aug 22 '19 09:08 bjorn3

Keeping the incremental data gives cg_clif a huge advantage over cg_llvm though:

$ hyperfine --prepare "rm -r target/debug/deps" --warmup 1 "cargo build" "RUSTFLAGS=-Ccodegen-units=1 cargo build" "CHANNEL=release ../cargo.sh build" 'RUSTFLAGS="-Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so -Cpanic=abort" cargo build --target x86_64-unknown-linux-gnu'
Benchmark #1: cargo build
  Time (mean ± σ):     28.747 s ±  1.363 s    [User: 76.805 s, System: 2.840 s]
  Range (min … max):   26.742 s … 30.305 s    10 runs
 
Benchmark #2: RUSTFLAGS=-Ccodegen-units=1 cargo build
  Time (mean ± σ):     34.041 s ±  2.252 s    [User: 63.653 s, System: 2.885 s]
  Range (min … max):   31.641 s … 38.352 s    10 runs
 
Benchmark #3: CHANNEL=release ../cargo.sh build
  Time (mean ± σ):     20.232 s ±  1.091 s    [User: 28.533 s, System: 1.291 s]
  Range (min … max):   18.719 s … 21.881 s    10 runs
 
Benchmark #4: RUSTFLAGS="-Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so -Cpanic=abort" cargo build --target x86_64-unknown-linux-gnu
  Time (mean ± σ):     20.707 s ±  2.022 s    [User: 28.811 s, System: 1.252 s]
  Range (min … max):   18.844 s … 25.844 s    10 runs
 
Summary
  'CHANNEL=release ../cargo.sh build' ran
    1.02 ± 0.11 times faster than 'RUSTFLAGS="-Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so -Cpanic=abort" cargo build --target x86_64-unknown-linux-gnu'
    1.42 ± 0.10 times faster than 'cargo build'
    1.68 ± 0.14 times faster than 'RUSTFLAGS=-Ccodegen-units=1 cargo build'

bjorn3 avatar Aug 22 '19 10:08 bjorn3

Runtime of simple-raytracer:

hyperfine ./raytracer-cg_llvm ./raytracer-cg_clif ./raytracer-cg_clif2 ./raytracer-cg_clif3 ./raytracer-cg_clif4
Benchmark #1: ./raytracer-cg_llvm
  Time (mean ± σ):      9.483 s ±  0.099 s    [User: 9.473 s, System: 0.006 s]
  Range (min … max):    9.396 s …  9.710 s    10 runs
 
Benchmark #2: ./raytracer-cg_clif
  Time (mean ± σ):     14.945 s ±  0.026 s    [User: 14.935 s, System: 0.005 s]
  Range (min … max):   14.910 s … 14.980 s    10 runs
 
Benchmark #3: ./raytracer-cg_clif2
  Time (mean ± σ):     14.091 s ±  0.082 s    [User: 14.079 s, System: 0.011 s]
  Range (min … max):   13.990 s … 14.301 s    10 runs
 
Benchmark #4: ./raytracer-cg_clif3
  Time (mean ± σ):     14.164 s ±  0.295 s    [User: 14.156 s, System: 0.007 s]
  Range (min … max):   13.983 s … 14.988 s    10 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark #5: ./raytracer-cg_clif4
  Time (mean ± σ):     10.750 s ±  0.208 s    [User: 10.744 s, System: 0.004 s]
  Range (min … max):   10.621 s … 11.312 s    10 runs
 
  Warning: The first benchmarking run for this command was significantly slower than the rest (11.312 s). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.
 
Summary
  './raytracer-cg_llvm' ran
    1.13 ± 0.02 times faster than './raytracer-cg_clif4'
    1.49 ± 0.02 times faster than './raytracer-cg_clif2'
    1.49 ± 0.03 times faster than './raytracer-cg_clif3'
    1.58 ± 0.02 times faster than './raytracer-cg_clif'

cg_clif is b9dc950a11509deadf2fa7bf6936184fe6113f4c cg_clif2 is 40629999bcbf230d14a7ac56d4b56a86b8fad3d8 cg_clif3 is 6127632c761b9a658b13c122ba0beb73f4542399 cg_clif4 is 1018a34662e0b8d9dfa650ed0ee1dfd84242ac37

bjorn3 avatar Aug 28 '19 16:08 bjorn3

After 15b9834d7d37d601fd77db11f8852f9ceb0804d0:

Benchmark #1: ./raytracer_cg_llvm
  Time (mean ± σ):      7.477 s ±  0.156 s    [User: 7.393 s, System: 0.037 s]
  Range (min … max):    7.237 s …  7.853 s    20 runs
 
Benchmark #2: ./raytracer_cg_clif
  Time (mean ± σ):      7.372 s ±  0.106 s    [User: 7.305 s, System: 0.029 s]
  Range (min … max):    7.240 s …  7.669 s    20 runs
 
Summary
  './raytracer_cg_clif' ran
    1.01 ± 0.03 times faster than './raytracer_cg_llvm'

(benched on faster machine than previous comments)

There is now pretty much no difference between cg_clif and cg_llvm. 🎉

bjorn3 avatar Aug 30 '19 14:08 bjorn3

5b17cf208330511a0cd9b5f496075f150b7821a4 added simple-raytracer as benchmark.

bjorn3 avatar Aug 30 '19 15:08 bjorn3

Just tried to compile veloren using cg_clif. I was surprised by the huge gap between cg_clif and cg_llvm:

CHANNEL="release" ../cargo.sh build  1613,63s user 64,18s system 315% cpu 8:51,02 total
                        cargo build  3272,94s user 50,07s system 315% cpu 17:33,41 total

I haven't tried running the compiled version though as threads are not yet supported.

bjorn3 avatar Jan 06 '20 20:01 bjorn3