rustc_codegen_cranelift icon indicating copy to clipboard operation
rustc_codegen_cranelift copied to clipboard

Improve results on the rustc-perf benchmark suite

Open bjorn3 opened this issue 5 years ago • 15 comments

I only ran the debug benchmarks, as check should be identical and release will definitively be faster because of much less optimizations by cg_clif.

Except for some stress-tests the clean and baseline incremental results are quite positive (~10-60% improvement, often ~40%) For clean incremental the results are much worse (easily ~200%), as compiled object files are not stored in the incremental cache (#760) For patched incremental the results are very mixed. Sometimes the difference is just a little bit less than clean incremental, while in other cases it is up to ~70% faster than cg_llvm.

packed-simd failed due to a verifier error. Edit(2020-03-11): Opened #919. hyper-2 failed due to unsized locals not being implemented (used for impl FnOnce for Box<FnOnce>). Edit(2020-03-11): Fixed in #916. style-servo failed due to running out of disk space.

Patch for rustc-perf
diff --git a/collector/src/bin/rustc-perf-collector/execute.rs b/collector/src/bin/rustc-perf-collector/execute.rs
index 9aa2cc48..4f577183 100644
--- a/collector/src/bin/rustc-perf-collector/execute.rs
+++ b/collector/src/bin/rustc-perf-collector/execute.rs
@@ -203,13 +203,19 @@ impl<'a> CargoProcess<'a> {
     fn run_rustc(&mut self) -> anyhow::Result<()> {
         loop {
             let mut cmd = self.base_command(self.cwd, "rustc");
+            cmd.env("RUSTFLAGS", "-Cpanic=abort \
+            -Zcodegen-backend=~/Documents/cg_clif/target/release/librustc_codegen_cranelift.so \
+            --sysroot ~/Documents/cg_clif/build_sysroot/sysroot");
+            cmd.arg("--target").arg("x86_64-unknown-linux-gnu");
             cmd.arg("-p").arg(self.get_pkgid(self.cwd));
             match self.build_kind {
                 BuildKind::Check => {
+                    return Ok(());
                     cmd.arg("--profile").arg("check");
                 }
                 BuildKind::Debug => {}
                 BuildKind::Opt => {
+                    return Ok(());
                     cmd.arg("--release");
                 }
             }
Results

image

bjorn3 avatar Jan 26 '20 16:01 bjorn3

Results after #918:

There are still regressions compared to cg_llvm, but most of the incremental compilation times have improved compared to cg_llvm.

Results

image

bjorn3 avatar Mar 11 '20 20:03 bjorn3

New patch for the collector:

diff --git a/collector/src/bin/rustc-perf-collector/execute.rs b/collector/src/bin/rustc-perf-collector/execute.rs
index 9aa2cc48..1caecc8c 100644
--- a/collector/src/bin/rustc-perf-collector/execute.rs
+++ b/collector/src/bin/rustc-perf-collector/execute.rs
@@ -203,13 +203,20 @@ impl<'a> CargoProcess<'a> {
     fn run_rustc(&mut self) -> anyhow::Result<()> {
         loop {
             let mut cmd = self.base_command(self.cwd, "rustc");
+            cmd.env("RUSTFLAGS", "-Cpanic=abort \
+            -Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so \
+            --sysroot /home/bjorn/Documenten/cg_clif/build_sysroot/sysroot");
+            cmd.arg("--target").arg("x86_64-unknown-linux-gnu");
             cmd.arg("-p").arg(self.get_pkgid(self.cwd));
+            cmd.env("CG_CLIF_INCR_CACHE", "1");
             match self.build_kind {
                 BuildKind::Check => {
+                    return Ok(());
                     cmd.arg("--profile").arg("check");
                 }
                 BuildKind::Debug => {}
                 BuildKind::Opt => {
+                    return Ok(());
                     cmd.arg("--release");
                 }
             }

Edit: Flipped the default of incr caching of object files in 198037119520d8cccafdc1fd511164c63d741aed, so the old patch is correct again.

bjorn3 avatar Mar 11 '20 20:03 bjorn3

A lot of the reds are caused by the linker taking much more time. (Up to 90%!)

bjorn3 avatar Mar 12 '20 22:03 bjorn3

5d516f9e118d6527947ca5deb3d76bbc4fa0f8a1 is a 20%-50% improvement on the coercions-debug benchmark. Overall it is a ~2% improvement.

bjorn3 avatar Mar 14 '20 16:03 bjorn3

Current results with lld:

Results

image

Patch for rustc-perf
diff --git a/collector/src/bin/rustc-perf-collector/execute.rs b/collector/src/bin/rustc-perf-collector/execute.rs
index 9aa2cc48..9787da13 100644
--- a/collector/src/bin/rustc-perf-collector/execute.rs
+++ b/collector/src/bin/rustc-perf-collector/execute.rs
@@ -203,13 +203,21 @@ impl<'a> CargoProcess<'a> {
     fn run_rustc(&mut self) -> anyhow::Result<()> {
         loop {
             let mut cmd = self.base_command(self.cwd, "rustc");
+            cmd.env("RUSTFLAGS", "-Cpanic=abort \
+            -Clink-args=-fuse-ld=lld -Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so \
+            --sysroot /home/bjorn/Documenten/cg_clif/build_sysroot/sysroot");
+            //cmd.env("RUSTFLAGS", "-Cpanic=abort -Clink-args=-fuse-ld=lld");
+            cmd.arg("--target").arg("x86_64-unknown-linux-gnu");
             cmd.arg("-p").arg(self.get_pkgid(self.cwd));
+            cmd.env("CG_CLIF_INCR_CACHE", "1");
             match self.build_kind {
                 BuildKind::Check => {
+                    return Ok(());
                     cmd.arg("--profile").arg("check");
                 }
                 BuildKind::Debug => {}
                 BuildKind::Opt => {
+                    return Ok(());
                     cmd.arg("--release");
                 }
             }

bjorn3 avatar Mar 14 '20 19:03 bjorn3

Although there are still regressions, they are almost entirely found in the tiny stress-test benchmarks. Most real-world benchmarks are seeing fantastic improvements!

Wonderful work, @bjorn3!

vultix avatar Mar 14 '20 20:03 vultix

There are a few places where a non stress-test benchmark regresses a few percent in one of the incremental benchmarks. Other than that many stress-test benchmarks regress because of slower linking. Improving this will benefit all other executable benchmarks too. For example the helloworld-debug regression can be completely explained by longer linking times. In fact the codegen part is faster for cg_clif.

bjorn3 avatar Mar 14 '20 20:03 bjorn3

Reran the benchmarks with firefox and vscode closed. Now only regression-31157-debug patched incremental is a significant regression:

image

bjorn3 avatar Mar 15 '20 22:03 bjorn3

With such huge improvements, how much work would you say is left for MVP?

vultix avatar Mar 15 '20 22:03 vultix

There are still missing features as mentioned in https://hackmd.io/@bjorn3/HJL5ryFS8. I don't know how long it will take to implement most of them. Some are hard, while others are less hard.

bjorn3 avatar Mar 15 '20 22:03 bjorn3

Are there any recent rustc-perf runs? I'm especially curious about the JIT mode.

NotAFile avatar Oct 11 '21 23:10 NotAFile

Not recently. Don't expect the JIT mode to be faster than AOT compilation. The JIT mode currently doesn't support incremental compilation, which makes it slower.

bjorn3 avatar Oct 12 '21 05:10 bjorn3

Here is the latest.. Using commit df7f02072b64712e5322ea70675135cb1e20bf80

localhost_2346_compare html_start=LLVM end=CG_CLIF stat=wall-time

CG_CLIF

diff --git a/collector/src/execute.rs b/collector/src/execute.rs
index d816eaaf..ec71984f 100644
--- a/collector/src/execute.rs
+++ b/collector/src/execute.rs
@@ -399,14 +399,21 @@ impl<'a> CargoProcess<'a> {
                 };
 
             let mut cmd = self.base_command(self.cwd, subcommand);
+            cmd.env(
+                "RUSTFLAGS",
+                "-Zcodegen-backend=/home/jasew/workspace/rustc_codegen_cranelift/build/lib/librustc_codegen_cranelift.so",
+            );
+            cmd.arg("--target").arg("x86_64-unknown-linux-gnu");
             cmd.arg("-p").arg(self.get_pkgid(self.cwd)?);
             match self.profile_kind {
                 ProfileKind::Check => {
+                    return Ok(());
                     cmd.arg("--profile").arg("check");
                 }
                 ProfileKind::Debug => {}
                 ProfileKind::Doc => {}
                 ProfileKind::Opt => {
+                    return Ok(());
                     cmd.arg("--release");
                 }
             }

LLVM

diff --git a/collector/src/execute.rs b/collector/src/execute.rs
index d816eaaf..ca34d0a3 100644
--- a/collector/src/execute.rs
+++ b/collector/src/execute.rs
@@ -399,14 +399,17 @@ impl<'a> CargoProcess<'a> {
                 };
 
             let mut cmd = self.base_command(self.cwd, subcommand);
+            cmd.arg("-j1");
             cmd.arg("-p").arg(self.get_pkgid(self.cwd)?);
             match self.profile_kind {
                 ProfileKind::Check => {
+                    return Ok(());
                     cmd.arg("--profile").arg("check");
                 }
                 ProfileKind::Debug => {}
                 ProfileKind::Doc => {}
                 ProfileKind::Opt => {
+                    return Ok(());
                     cmd.arg("--release");
                 }
             }

Notes:

  • I needed perf, https://gist.github.com/abel0b/b1881e41b9e1c4b16d84e5e083c38a13 worked fine
  • rust-perf https://github.com/rust-lang/rustc-perf

Processor AMD Ryzen 9 5950X 16-Core Processor 3.40 GHz Installed RAM 32.0 GB

jasonwilliams avatar Dec 08 '21 21:12 jasonwilliams

cc https://github.com/bjorn3/rustc_codegen_cranelift/pull/1271

bjorn3 avatar Aug 25 '22 17:08 bjorn3