rust Merge basic blocks where possible when generating LLVM IR.

r? @ghost

Oct 17 '22 07:10 nnethercote

@bors try @rust-timer queue

Oct 17 '22 07:10 nnethercote

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

Oct 17 '22 07:10 rust-timer

:hourglass: Trying commit a48b4cc08278a9a37892cc745a38b1cbfbf29340 with merge 0a35b2797788a7dd1063c4b0155bc4ade8ec24f5...

Oct 17 '22 07:10 bors

The job mingw-check failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

configure: rust.debug-assertions := True
configure: rust.overflow-checks := True
configure: llvm.assertions      := True
configure: dist.missing-tools   := True
configure: build.configure-args := ['--enable-sccache', '--disable-manage-submodu ...
configure: writing `config.toml` in current directory
configure: 
configure: run `python /checkout/x.py --help`
Attempting with retry: make prepare
---
skip untracked path cpu-usage.csv during rustfmt invocations
skip untracked path src/doc/book/ during rustfmt invocations
skip untracked path src/doc/rust-by-example/ during rustfmt invocations
skip untracked path src/llvm-project/ during rustfmt invocations
Diff in /checkout/compiler/rustc_codegen_ssa/src/mir/block.rs at line 131:
         fx: &mut FunctionCx<'a, 'tcx, Bx>,
         bx: &mut Bx,
-        follow_on: bool
+        follow_on: bool,
     ) -> bool {
     ) -> bool {
         // njn: duplicated stuff from .lltarget()
         // njn: also, some non-(None,None) cases in .lltarget() that could be used here
Diff in /checkout/compiler/rustc_codegen_ssa/src/mir/block.rs at line 834:
         if intrinsic == Some(sym::caller_location) {
             if let Some(target) = target {
-                let location = self
-                let location = self
-                    .get_caller_location(bx, mir::SourceInfo { span: fn_span, ..source_info });
+                let location =
+                    self.get_caller_location(bx, mir::SourceInfo { span: fn_span, ..source_info });
 
                 if let ReturnDest::IndirectOperand(tmp, _) = ret_dest {
                     location.val.store(bx, tmp);
Diff in /checkout/compiler/rustc_codegen_ssa/src/mir/block.rs at line 1019:
             self.codegen_argument(bx, op, &mut llargs, &fn_abi.args[i]);
         }
         let num_untupled = untuple.map(|tup| {
-            self.codegen_arguments_untupled(
-                bx,
-                tup,
-                &mut llargs,
-                &fn_abi.args[first_args.len()..],
-            )
+            self.codegen_arguments_untupled(bx, tup, &mut llargs, &fn_abi.args[first_args.len()..])
 
         let needs_location =
Diff in /checkout/compiler/rustc_codegen_ssa/src/mir/block.rs at line 1330:
                     target,
                     target,
                     cleanup,
                     fn_span,
-                    follow_on
+                    follow_on,
                 )
             }
             mir::TerminatorKind::GeneratorDrop | mir::TerminatorKind::Yield { .. } => {
Running `"/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/rustfmt" "--config-path" "/checkout" "--edition" "2021" "--unstable-features" "--skip-children" "--check" "/checkout/compiler/rustc_hir_analysis/src/check/fn_ctxt/mod.rs" "/checkout/compiler/rustc_hir_analysis/src/check/fn_ctxt/suggestions.rs" "/checkout/compiler/rustc_hir_analysis/src/check/fn_ctxt/_impl.rs" "/checkout/compiler/rustc_hir_analysis/src/check/fn_ctxt/arg_matrix.rs" "/checkout/compiler/rustc_hir_analysis/src/check/fn_ctxt/checks.rs" "/checkout/compiler/rustc_hir_analysis/src/check/method/prelude2021.rs" "/checkout/compiler/rustc_codegen_ssa/src/mir/block.rs" "/checkout/compiler/rustc_codegen_ssa/src/base.rs"` failed.
If you're running `tidy`, try again with `--bless`. Or, if you just want to format code, run `./x.py fmt` instead.

Oct 17 '22 07:10 rust-log-analyzer

:sunny: Try build successful - checks-actions Build commit: 0a35b2797788a7dd1063c4b0155bc4ade8ec24f5 (0a35b2797788a7dd1063c4b0155bc4ade8ec24f5)

Oct 17 '22 09:10 bors

Queued 0a35b2797788a7dd1063c4b0155bc4ade8ec24f5 with parent 1536ab1b383f21b38f8d49230a2aecc51daffa3d, future comparison URL.

Oct 17 '22 09:10 rust-timer

Finished benchmarking commit (0a35b2797788a7dd1063c4b0155bc4ade8ec24f5): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never @rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean[^1]	range	count[^2]
Regressions ❌ (primary)	1.0%	[0.8%, 1.2%]	6
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.3%	[-0.3%, -0.3%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.8%	[-0.3%, 1.2%]	7

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean[^1]	range	count[^2]
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	9.2%	[9.2%, 9.2%]	1
Improvements ✅ (primary)	-0.1%	[-0.1%, -0.1%]	1
Improvements ✅ (secondary)	-2.5%	[-3.2%, -2.1%]	4
All ❌✅ (primary)	-0.1%	[-0.1%, -0.1%]	1

Cycles

This benchmark run did not return any relevant results for this metric.

[^1]: the arithmetic mean of the percent change [^2]: number of relevant changes

Oct 17 '22 10:10 rust-timer

The instruction count results aren't a win, but there are hints of goodness in the results for cycles, wall-time, max-rss, and especially binary size. The current version only merges the simplest cases, and there are quite a few more cases that can be handled, so I will continue working on them.

Oct 17 '22 11:10 nnethercote

@bors try @rust-timer queue

Oct 18 '22 06:10 nnethercote

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

Oct 18 '22 06:10 rust-timer

:hourglass: Trying commit 165b498be31961a522cedd64bb9bbe33c072d0f4 with merge 61e75799adaa22db3b3d115e5c1d921210da60ad...

Oct 18 '22 06:10 bors

:sunny: Try build successful - checks-actions Build commit: 61e75799adaa22db3b3d115e5c1d921210da60ad (61e75799adaa22db3b3d115e5c1d921210da60ad)

Oct 18 '22 09:10 bors

Queued 61e75799adaa22db3b3d115e5c1d921210da60ad with parent 98a5ac269cffada469753ad2416717e251863f9a, future comparison URL.

Oct 18 '22 09:10 rust-timer

Finished benchmarking commit (61e75799adaa22db3b3d115e5c1d921210da60ad): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never @rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean[^1]	range	count[^2]
Regressions ❌ (primary)	0.9%	[0.4%, 1.3%]	7
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.3%	[-0.3%, -0.3%]	2
Improvements ✅ (secondary)	-0.3%	[-0.3%, -0.3%]	1
All ❌✅ (primary)	0.6%	[-0.3%, 1.3%]	9

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean[^1]	range	count[^2]
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	4.0%	[2.2%, 6.3%]	5
Improvements ✅ (primary)	-1.5%	[-2.7%, -0.3%]	2
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.5%	[-2.7%, -0.3%]	2

Cycles

This benchmark run did not return any relevant results for this metric.

[^1]: the arithmetic mean of the percent change [^2]: number of relevant changes

Oct 18 '22 10:10 rust-timer

Disappointing results here. The code is working as intended, and is merging lots of basic blocks. Here are some measurements for three metrics:

wc: size of LLVM IR as measured by running wc -l on the .ll output.
llvm-lines: size of LLVM IR as measured by cargo llvm-lines
br label: number of br label %bbN instructions in the LLVM IR.

All measurements are for debug builds.

-----------------------------------------------------------------------------
                wc                       llvm-lines               br label
-----------------------------------------------------------------------------
                before  after            before after             before after
-----------------------------------------------------------------------------
clap-3.1.6      657,418 629,719 (-4.3%)  296,511 287,343 (-3.1%)  22,001 12,848 (-42%)
regex-1.5.5     464,556 450,134 (-4.1%)  142,199 137,092 (-3.6%)  11,471  6,720 (-41%)
ripgrep-13.0.0  608,307 577,649 (-5.1%)  257,134 246,471 (-4.1%)  23,942 13,783 (-42%)
syn-1.0.89      410,964 393,340 (-4.3%)  171,194 165,376 (-3.4%)  13,361  7,598 (-43%)
-----------------------------------------------------------------------------

Plenty of shrinkage but the effect on compile times is negligible, or even a slight regression (for instruction counts) in some cases. The only good news is that the binary size of debug builds shrunk by a small amount in many cases, which makes sense, but it doesn't feel like enough of a benefit to continue pushing on this.

Oct 19 '22 02:10 nnethercote

To summarize:

MIR uses one definition of BBs, and LLVM IR uses another. Most notably, function calls end a MIR BB but don't end an LLVM IR BB.
rustc generates reasonable MIR code.
rustc does a 1-to-1 translation of MIR BBs to LLVM IR BBs, which is reasonable.
The resulting LLVM IR looks a bit silly and quite sub-optimal, with many unconditional BB-to-BB jumps, because of the different BB definition.
The sub-optimality doesn't end up mattering much in terms of compiler perf.
The sub-optimality also doesn't matter for the output of opt builds, because LLVM can optimize away the extra jumps and the output ends up the same.
The sub-optimality matters slightly for the output of debug builds, because it causes binaries to be about 0.5% bigger. It may also make them slightly slower, though I haven't measured that and I suspect the effect would be very small, probably less than 0.5%.

Oct 20 '22 03:10 nnethercote

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo

Nov 09 '22 06:11 rustbot

I'm reopening this for further consideration. Even though it didn't make much difference to compiler perf, which was the original motivation, it might still be worth merging.

Pros:

Small (mostly <1%) binary size improvements, mostly for debug builds.
Generated LLVM IR is easier to read, due to not having unnecessary branches.
Generate binary is easier to debug at the machine code level, due to not having unnecessary branches (cc @Amanieu).

Cons:

Some extra complexity in codegen.
It's a little harder to see the MIR-to-LLVM-IR mapping.

Nov 09 '22 06:11 nnethercote

@bors try @rust-timer queue

Nov 09 '22 06:11 nnethercote

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

Nov 09 '22 06:11 rust-timer

:hourglass: Trying commit 9ca699a18569ad6117e593295e8da10d87c58db3 with merge cfe97392ed166ef3911f1539074b18cfe078baa5...

Nov 09 '22 06:11 bors

r? @bjorn3

Nov 09 '22 08:11 nnethercote

:sunny: Try build successful - checks-actions Build commit: cfe97392ed166ef3911f1539074b18cfe078baa5 (cfe97392ed166ef3911f1539074b18cfe078baa5)

Nov 09 '22 08:11 bors

Queued cfe97392ed166ef3911f1539074b18cfe078baa5 with parent bc2504a83ca6ee8f6717dedd0721b90ffcbe1300, future comparison URL.

Nov 09 '22 08:11 rust-timer

Finished benchmarking commit (cfe97392ed166ef3911f1539074b18cfe078baa5): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never @rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.7%	[-1.5%, -0.2%]	10
Improvements ✅ (secondary)	-0.4%	[-0.4%, -0.3%]	2
All ❌✅ (primary)	-0.7%	[-1.5%, -0.2%]	10

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	5.5%	[5.5%, 5.5%]	1
Improvements ✅ (primary)	-1.6%	[-2.6%, -0.1%]	3
Improvements ✅ (secondary)	-4.0%	[-4.0%, -4.0%]	1
All ❌✅ (primary)	-1.6%	[-2.6%, -0.1%]	3

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-3.0%	[-3.0%, -3.0%]	1
All ❌✅ (primary)	-	-	0

Nov 09 '22 10:11 rust-timer

rust rust copied to clipboard

Merge basic blocks where possible when generating LLVM IR.

Overall result: ❌ regressions - ACTION NEEDED

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Overall result: ✅ improvements - no action needed

rust
rust copied to clipboard