rustc_codegen_cranelift Improve results on the rustc-perf benchmark suite

I only ran the debug benchmarks, as check should be identical and release will definitively be faster because of much less optimizations by cg_clif.

Except for some stress-tests the clean and baseline incremental results are quite positive (~10-60% improvement, often ~40%) For clean incremental the results are much worse (easily ~200%), as compiled object files are not stored in the incremental cache (#760) For patched incremental the results are very mixed. Sometimes the difference is just a little bit less than clean incremental, while in other cases it is up to ~70% faster than cg_llvm.

packed-simd failed due to a verifier error. Edit(2020-03-11): Opened #919. ~~hyper-2 failed due to unsized locals not being implemented (used for impl FnOnce for Box<FnOnce>).~~ Edit(2020-03-11): Fixed in #916. style-servo failed due to running out of disk space.

Patch for rustc-perf

diff --git a/collector/src/bin/rustc-perf-collector/execute.rs b/collector/src/bin/rustc-perf-collector/execute.rs
index 9aa2cc48..4f577183 100644
--- a/collector/src/bin/rustc-perf-collector/execute.rs
+++ b/collector/src/bin/rustc-perf-collector/execute.rs
@@ -203,13 +203,19 @@ impl<'a> CargoProcess<'a> {
     fn run_rustc(&mut self) -> anyhow::Result<()> {
         loop {
             let mut cmd = self.base_command(self.cwd, "rustc");
+            cmd.env("RUSTFLAGS", "-Cpanic=abort \
+            -Zcodegen-backend=~/Documents/cg_clif/target/release/librustc_codegen_cranelift.so \
+            --sysroot ~/Documents/cg_clif/build_sysroot/sysroot");
+            cmd.arg("--target").arg("x86_64-unknown-linux-gnu");
             cmd.arg("-p").arg(self.get_pkgid(self.cwd));
             match self.build_kind {
                 BuildKind::Check => {
+                    return Ok(());
                     cmd.arg("--profile").arg("check");
                 }
                 BuildKind::Debug => {}
                 BuildKind::Opt => {
+                    return Ok(());
                     cmd.arg("--release");
                 }
             }

Results

Jan 26 '20 16:01 bjorn3

Results after #918:

There are still regressions compared to cg_llvm, but most of the incremental compilation times have improved compared to cg_llvm.

Results

Mar 11 '20 20:03 bjorn3

New patch for the collector:

diff --git a/collector/src/bin/rustc-perf-collector/execute.rs b/collector/src/bin/rustc-perf-collector/execute.rs
index 9aa2cc48..1caecc8c 100644
--- a/collector/src/bin/rustc-perf-collector/execute.rs
+++ b/collector/src/bin/rustc-perf-collector/execute.rs
@@ -203,13 +203,20 @@ impl<'a> CargoProcess<'a> {
     fn run_rustc(&mut self) -> anyhow::Result<()> {
         loop {
             let mut cmd = self.base_command(self.cwd, "rustc");
+            cmd.env("RUSTFLAGS", "-Cpanic=abort \
+            -Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so \
+            --sysroot /home/bjorn/Documenten/cg_clif/build_sysroot/sysroot");
+            cmd.arg("--target").arg("x86_64-unknown-linux-gnu");
             cmd.arg("-p").arg(self.get_pkgid(self.cwd));
+            cmd.env("CG_CLIF_INCR_CACHE", "1");
             match self.build_kind {
                 BuildKind::Check => {
+                    return Ok(());
                     cmd.arg("--profile").arg("check");
                 }
                 BuildKind::Debug => {}
                 BuildKind::Opt => {
+                    return Ok(());
                     cmd.arg("--release");
                 }
             }

Edit: Flipped the default of incr caching of object files in 198037119520d8cccafdc1fd511164c63d741aed, so the old patch is correct again.

Mar 11 '20 20:03 bjorn3

A lot of the reds are caused by the linker taking much more time. (Up to 90%!)

Mar 12 '20 22:03 bjorn3

5d516f9e118d6527947ca5deb3d76bbc4fa0f8a1 is a 20%-50% improvement on the coercions-debug benchmark. Overall it is a ~2% improvement.

Mar 14 '20 16:03 bjorn3

Current results with lld:

Results

Patch for rustc-perf

diff --git a/collector/src/bin/rustc-perf-collector/execute.rs b/collector/src/bin/rustc-perf-collector/execute.rs
index 9aa2cc48..9787da13 100644
--- a/collector/src/bin/rustc-perf-collector/execute.rs
+++ b/collector/src/bin/rustc-perf-collector/execute.rs
@@ -203,13 +203,21 @@ impl<'a> CargoProcess<'a> {
     fn run_rustc(&mut self) -> anyhow::Result<()> {
         loop {
             let mut cmd = self.base_command(self.cwd, "rustc");
+            cmd.env("RUSTFLAGS", "-Cpanic=abort \
+            -Clink-args=-fuse-ld=lld -Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so \
+            --sysroot /home/bjorn/Documenten/cg_clif/build_sysroot/sysroot");
+            //cmd.env("RUSTFLAGS", "-Cpanic=abort -Clink-args=-fuse-ld=lld");
+            cmd.arg("--target").arg("x86_64-unknown-linux-gnu");
             cmd.arg("-p").arg(self.get_pkgid(self.cwd));
+            cmd.env("CG_CLIF_INCR_CACHE", "1");
             match self.build_kind {
                 BuildKind::Check => {
+                    return Ok(());
                     cmd.arg("--profile").arg("check");
                 }
                 BuildKind::Debug => {}
                 BuildKind::Opt => {
+                    return Ok(());
                     cmd.arg("--release");
                 }
             }

Mar 14 '20 19:03 bjorn3

Although there are still regressions, they are almost entirely found in the tiny stress-test benchmarks. Most real-world benchmarks are seeing fantastic improvements!

Wonderful work, @bjorn3!

Mar 14 '20 20:03 vultix

There are a few places where a non stress-test benchmark regresses a few percent in one of the incremental benchmarks. Other than that many stress-test benchmarks regress because of slower linking. Improving this will benefit all other executable benchmarks too. For example the helloworld-debug regression can be completely explained by longer linking times. In fact the codegen part is faster for cg_clif.

Mar 14 '20 20:03 bjorn3

Reran the benchmarks with firefox and vscode closed. Now only regression-31157-debug patched incremental is a significant regression:

Mar 15 '20 22:03 bjorn3

With such huge improvements, how much work would you say is left for MVP?

Mar 15 '20 22:03 vultix

There are still missing features as mentioned in https://hackmd.io/@bjorn3/HJL5ryFS8. I don't know how long it will take to implement most of them. Some are hard, while others are less hard.

Mar 15 '20 22:03 bjorn3

Are there any recent rustc-perf runs? I'm especially curious about the JIT mode.

Oct 11 '21 23:10 NotAFile

Not recently. Don't expect the JIT mode to be faster than AOT compilation. The JIT mode currently doesn't support incremental compilation, which makes it slower.

Oct 12 '21 05:10 bjorn3

Here is the latest.. Using commit df7f02072b64712e5322ea70675135cb1e20bf80

localhost_2346_compare html_start=LLVM end=CG_CLIF stat=wall-time

CG_CLIF

diff --git a/collector/src/execute.rs b/collector/src/execute.rs
index d816eaaf..ec71984f 100644
--- a/collector/src/execute.rs
+++ b/collector/src/execute.rs
@@ -399,14 +399,21 @@ impl<'a> CargoProcess<'a> {
                 };
 
             let mut cmd = self.base_command(self.cwd, subcommand);
+            cmd.env(
+                "RUSTFLAGS",
+                "-Zcodegen-backend=/home/jasew/workspace/rustc_codegen_cranelift/build/lib/librustc_codegen_cranelift.so",
+            );
+            cmd.arg("--target").arg("x86_64-unknown-linux-gnu");
             cmd.arg("-p").arg(self.get_pkgid(self.cwd)?);
             match self.profile_kind {
                 ProfileKind::Check => {
+                    return Ok(());
                     cmd.arg("--profile").arg("check");
                 }
                 ProfileKind::Debug => {}
                 ProfileKind::Doc => {}
                 ProfileKind::Opt => {
+                    return Ok(());
                     cmd.arg("--release");
                 }
             }

LLVM

diff --git a/collector/src/execute.rs b/collector/src/execute.rs
index d816eaaf..ca34d0a3 100644
--- a/collector/src/execute.rs
+++ b/collector/src/execute.rs
@@ -399,14 +399,17 @@ impl<'a> CargoProcess<'a> {
                 };
 
             let mut cmd = self.base_command(self.cwd, subcommand);
+            cmd.arg("-j1");
             cmd.arg("-p").arg(self.get_pkgid(self.cwd)?);
             match self.profile_kind {
                 ProfileKind::Check => {
+                    return Ok(());
                     cmd.arg("--profile").arg("check");
                 }
                 ProfileKind::Debug => {}
                 ProfileKind::Doc => {}
                 ProfileKind::Opt => {
+                    return Ok(());
                     cmd.arg("--release");
                 }
             }

Notes:

I needed perf, https://gist.github.com/abel0b/b1881e41b9e1c4b16d84e5e083c38a13 worked fine
rust-perf https://github.com/rust-lang/rustc-perf

Processor AMD Ryzen 9 5950X 16-Core Processor 3.40 GHz Installed RAM 32.0 GB

Dec 08 '21 21:12 jasonwilliams

cc https://github.com/bjorn3/rustc_codegen_cranelift/pull/1271

Aug 25 '22 17:08 bjorn3

rustc_codegen_cranelift rustc_codegen_cranelift copied to clipboard

Improve results on the rustc-perf benchmark suite

rustc_codegen_cranelift
rustc_codegen_cranelift copied to clipboard