rust
rust copied to clipboard
Merge basic blocks where possible when generating LLVM IR.
r? @ghost
@bors try @rust-timer queue
Awaiting bors try build completion.
@rustbot label: +S-waiting-on-perf
:hourglass: Trying commit a48b4cc08278a9a37892cc745a38b1cbfbf29340 with merge 0a35b2797788a7dd1063c4b0155bc4ade8ec24f5...
The job mingw-check
failed! Check out the build log: (web) (plain)
Click to see the possible cause of the failure (guessed by this bot)
configure: rust.debug-assertions := True
configure: rust.overflow-checks := True
configure: llvm.assertions := True
configure: dist.missing-tools := True
configure: build.configure-args := ['--enable-sccache', '--disable-manage-submodu ...
configure: writing `config.toml` in current directory
configure:
configure: run `python /checkout/x.py --help`
Attempting with retry: make prepare
---
skip untracked path cpu-usage.csv during rustfmt invocations
skip untracked path src/doc/book/ during rustfmt invocations
skip untracked path src/doc/rust-by-example/ during rustfmt invocations
skip untracked path src/llvm-project/ during rustfmt invocations
Diff in /checkout/compiler/rustc_codegen_ssa/src/mir/block.rs at line 131:
fx: &mut FunctionCx<'a, 'tcx, Bx>,
bx: &mut Bx,
- follow_on: bool
+ follow_on: bool,
) -> bool {
) -> bool {
// njn: duplicated stuff from .lltarget()
// njn: also, some non-(None,None) cases in .lltarget() that could be used here
Diff in /checkout/compiler/rustc_codegen_ssa/src/mir/block.rs at line 834:
if intrinsic == Some(sym::caller_location) {
if let Some(target) = target {
- let location = self
- let location = self
- .get_caller_location(bx, mir::SourceInfo { span: fn_span, ..source_info });
+ let location =
+ self.get_caller_location(bx, mir::SourceInfo { span: fn_span, ..source_info });
if let ReturnDest::IndirectOperand(tmp, _) = ret_dest {
location.val.store(bx, tmp);
Diff in /checkout/compiler/rustc_codegen_ssa/src/mir/block.rs at line 1019:
self.codegen_argument(bx, op, &mut llargs, &fn_abi.args[i]);
}
let num_untupled = untuple.map(|tup| {
- self.codegen_arguments_untupled(
- bx,
- tup,
- &mut llargs,
- &fn_abi.args[first_args.len()..],
- )
+ self.codegen_arguments_untupled(bx, tup, &mut llargs, &fn_abi.args[first_args.len()..])
let needs_location =
Diff in /checkout/compiler/rustc_codegen_ssa/src/mir/block.rs at line 1330:
target,
target,
cleanup,
fn_span,
- follow_on
+ follow_on,
)
}
mir::TerminatorKind::GeneratorDrop | mir::TerminatorKind::Yield { .. } => {
Running `"/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/rustfmt" "--config-path" "/checkout" "--edition" "2021" "--unstable-features" "--skip-children" "--check" "/checkout/compiler/rustc_hir_analysis/src/check/fn_ctxt/mod.rs" "/checkout/compiler/rustc_hir_analysis/src/check/fn_ctxt/suggestions.rs" "/checkout/compiler/rustc_hir_analysis/src/check/fn_ctxt/_impl.rs" "/checkout/compiler/rustc_hir_analysis/src/check/fn_ctxt/arg_matrix.rs" "/checkout/compiler/rustc_hir_analysis/src/check/fn_ctxt/checks.rs" "/checkout/compiler/rustc_hir_analysis/src/check/method/prelude2021.rs" "/checkout/compiler/rustc_codegen_ssa/src/mir/block.rs" "/checkout/compiler/rustc_codegen_ssa/src/base.rs"` failed.
If you're running `tidy`, try again with `--bless`. Or, if you just want to format code, run `./x.py fmt` instead.
:sunny: Try build successful - checks-actions
Build commit: 0a35b2797788a7dd1063c4b0155bc4ade8ec24f5 (0a35b2797788a7dd1063c4b0155bc4ade8ec24f5
)
Queued 0a35b2797788a7dd1063c4b0155bc4ade8ec24f5 with parent 1536ab1b383f21b38f8d49230a2aecc51daffa3d, future comparison URL.
Finished benchmarking commit (0a35b2797788a7dd1063c4b0155bc4ade8ec24f5): comparison URL.
Overall result: ❌ regressions - ACTION NEEDED
Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.
Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged
along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.
@bors rollup=never @rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression
Instruction count
This is a highly reliable metric that was used to determine the overall result at the top of this comment.
mean[^1] | range | count[^2] | |
---|---|---|---|
Regressions ❌ (primary) |
1.0% | [0.8%, 1.2%] | 6 |
Regressions ❌ (secondary) |
- | - | 0 |
Improvements ✅ (primary) |
-0.3% | [-0.3%, -0.3%] | 1 |
Improvements ✅ (secondary) |
- | - | 0 |
All ❌✅ (primary) | 0.8% | [-0.3%, 1.2%] | 7 |
Max RSS (memory usage)
Results
This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
mean[^1] | range | count[^2] | |
---|---|---|---|
Regressions ❌ (primary) |
- | - | 0 |
Regressions ❌ (secondary) |
9.2% | [9.2%, 9.2%] | 1 |
Improvements ✅ (primary) |
-0.1% | [-0.1%, -0.1%] | 1 |
Improvements ✅ (secondary) |
-2.5% | [-3.2%, -2.1%] | 4 |
All ❌✅ (primary) | -0.1% | [-0.1%, -0.1%] | 1 |
Cycles
This benchmark run did not return any relevant results for this metric.
[^1]: the arithmetic mean of the percent change [^2]: number of relevant changes
The instruction count results aren't a win, but there are hints of goodness in the results for cycles, wall-time, max-rss, and especially binary size. The current version only merges the simplest cases, and there are quite a few more cases that can be handled, so I will continue working on them.
@bors try @rust-timer queue
Awaiting bors try build completion.
@rustbot label: +S-waiting-on-perf
:hourglass: Trying commit 165b498be31961a522cedd64bb9bbe33c072d0f4 with merge 61e75799adaa22db3b3d115e5c1d921210da60ad...
:sunny: Try build successful - checks-actions
Build commit: 61e75799adaa22db3b3d115e5c1d921210da60ad (61e75799adaa22db3b3d115e5c1d921210da60ad
)
Queued 61e75799adaa22db3b3d115e5c1d921210da60ad with parent 98a5ac269cffada469753ad2416717e251863f9a, future comparison URL.
Finished benchmarking commit (61e75799adaa22db3b3d115e5c1d921210da60ad): comparison URL.
Overall result: ❌✅ regressions and improvements - ACTION NEEDED
Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.
Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged
along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.
@bors rollup=never @rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression
Instruction count
This is a highly reliable metric that was used to determine the overall result at the top of this comment.
mean[^1] | range | count[^2] | |
---|---|---|---|
Regressions ❌ (primary) |
0.9% | [0.4%, 1.3%] | 7 |
Regressions ❌ (secondary) |
- | - | 0 |
Improvements ✅ (primary) |
-0.3% | [-0.3%, -0.3%] | 2 |
Improvements ✅ (secondary) |
-0.3% | [-0.3%, -0.3%] | 1 |
All ❌✅ (primary) | 0.6% | [-0.3%, 1.3%] | 9 |
Max RSS (memory usage)
Results
This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
mean[^1] | range | count[^2] | |
---|---|---|---|
Regressions ❌ (primary) |
- | - | 0 |
Regressions ❌ (secondary) |
4.0% | [2.2%, 6.3%] | 5 |
Improvements ✅ (primary) |
-1.5% | [-2.7%, -0.3%] | 2 |
Improvements ✅ (secondary) |
- | - | 0 |
All ❌✅ (primary) | -1.5% | [-2.7%, -0.3%] | 2 |
Cycles
This benchmark run did not return any relevant results for this metric.
[^1]: the arithmetic mean of the percent change [^2]: number of relevant changes
Disappointing results here. The code is working as intended, and is merging lots of basic blocks. Here are some measurements for three metrics:
-
wc
: size of LLVM IR as measured by runningwc -l
on the.ll
output. -
llvm-lines
: size of LLVM IR as measured bycargo llvm-lines
-
br label
: number ofbr label %bbN
instructions in the LLVM IR.
All measurements are for debug builds.
-----------------------------------------------------------------------------
wc llvm-lines br label
-----------------------------------------------------------------------------
before after before after before after
-----------------------------------------------------------------------------
clap-3.1.6 657,418 629,719 (-4.3%) 296,511 287,343 (-3.1%) 22,001 12,848 (-42%)
regex-1.5.5 464,556 450,134 (-4.1%) 142,199 137,092 (-3.6%) 11,471 6,720 (-41%)
ripgrep-13.0.0 608,307 577,649 (-5.1%) 257,134 246,471 (-4.1%) 23,942 13,783 (-42%)
syn-1.0.89 410,964 393,340 (-4.3%) 171,194 165,376 (-3.4%) 13,361 7,598 (-43%)
-----------------------------------------------------------------------------
Plenty of shrinkage but the effect on compile times is negligible, or even a slight regression (for instruction counts) in some cases. The only good news is that the binary size of debug builds shrunk by a small amount in many cases, which makes sense, but it doesn't feel like enough of a benefit to continue pushing on this.
To summarize:
- MIR uses one definition of BBs, and LLVM IR uses another. Most notably, function calls end a MIR BB but don't end an LLVM IR BB.
- rustc generates reasonable MIR code.
- rustc does a 1-to-1 translation of MIR BBs to LLVM IR BBs, which is reasonable.
- The resulting LLVM IR looks a bit silly and quite sub-optimal, with many unconditional BB-to-BB jumps, because of the different BB definition.
- The sub-optimality doesn't end up mattering much in terms of compiler perf.
- The sub-optimality also doesn't matter for the output of opt builds, because LLVM can optimize away the extra jumps and the output ends up the same.
- The sub-optimality matters slightly for the output of debug builds, because it causes binaries to be about 0.5% bigger. It may also make them slightly slower, though I haven't measured that and I suspect the effect would be very small, probably less than 0.5%.
Some changes occurred in compiler/rustc_codegen_gcc
cc @antoyo
I'm reopening this for further consideration. Even though it didn't make much difference to compiler perf, which was the original motivation, it might still be worth merging.
Pros:
- Small (mostly <1%) binary size improvements, mostly for debug builds.
- Generated LLVM IR is easier to read, due to not having unnecessary branches.
- Generate binary is easier to debug at the machine code level, due to not having unnecessary branches (cc @Amanieu).
Cons:
- Some extra complexity in codegen.
- It's a little harder to see the MIR-to-LLVM-IR mapping.
@bors try @rust-timer queue
Awaiting bors try build completion.
@rustbot label: +S-waiting-on-perf
:hourglass: Trying commit 9ca699a18569ad6117e593295e8da10d87c58db3 with merge cfe97392ed166ef3911f1539074b18cfe078baa5...
r? @bjorn3
:sunny: Try build successful - checks-actions
Build commit: cfe97392ed166ef3911f1539074b18cfe078baa5 (cfe97392ed166ef3911f1539074b18cfe078baa5
)
Queued cfe97392ed166ef3911f1539074b18cfe078baa5 with parent bc2504a83ca6ee8f6717dedd0721b90ffcbe1300, future comparison URL.
Finished benchmarking commit (cfe97392ed166ef3911f1539074b18cfe078baa5): comparison URL.
Overall result: ✅ improvements - no action needed
Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.
@bors rollup=never @rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression
Instruction count
This is a highly reliable metric that was used to determine the overall result at the top of this comment.
mean | range | count | |
---|---|---|---|
Regressions ❌ (primary) |
- | - | 0 |
Regressions ❌ (secondary) |
- | - | 0 |
Improvements ✅ (primary) |
-0.7% | [-1.5%, -0.2%] | 10 |
Improvements ✅ (secondary) |
-0.4% | [-0.4%, -0.3%] | 2 |
All ❌✅ (primary) | -0.7% | [-1.5%, -0.2%] | 10 |
Max RSS (memory usage)
Results
This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
mean | range | count | |
---|---|---|---|
Regressions ❌ (primary) |
- | - | 0 |
Regressions ❌ (secondary) |
5.5% | [5.5%, 5.5%] | 1 |
Improvements ✅ (primary) |
-1.6% | [-2.6%, -0.1%] | 3 |
Improvements ✅ (secondary) |
-4.0% | [-4.0%, -4.0%] | 1 |
All ❌✅ (primary) | -1.6% | [-2.6%, -0.1%] | 3 |
Cycles
Results
This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
mean | range | count | |
---|---|---|---|
Regressions ❌ (primary) |
- | - | 0 |
Regressions ❌ (secondary) |
- | - | 0 |
Improvements ✅ (primary) |
- | - | 0 |
Improvements ✅ (secondary) |
-3.0% | [-3.0%, -3.0%] | 1 |
All ❌✅ (primary) | - | - | 0 |