rust Use `DeepRejectCtxt` to quickly reject `ParamEnv` candidates

The description is on the zulip thread

r? @lcnr

Aug 07 '24 11:08 Bryanskiy

@bors try @rust-timer queue

Aug 09 '24 14:08 lcnr

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

Aug 09 '24 14:08 rust-timer

:hourglass: Trying commit 77fc8536f35c5e4bbd7c0a6b304f8f2581dc1ea0 with merge 7a0b60e0de7dfef4ed18ad00b864a144909632d3...

Aug 09 '24 14:08 bors

:sunny: Try build successful - checks-actions Build commit: 7a0b60e0de7dfef4ed18ad00b864a144909632d3 (7a0b60e0de7dfef4ed18ad00b864a144909632d3)

Aug 09 '24 16:08 bors

Queued 7a0b60e0de7dfef4ed18ad00b864a144909632d3 with parent 899eb03926be23f2e5d2ffcaa1d6f9ac40af7f13, future comparison URL. There are currently 0 preceding artifacts in the queue. It will probably take at least ~1.2 hours until the benchmark run finishes.

Aug 09 '24 16:08 rust-timer

Finished benchmarking commit (7a0b60e0de7dfef4ed18ad00b864a144909632d3): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never @rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	6.3%	[0.2%, 12.7%]	18
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.8%	[-4.2%, -0.3%]	15
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	2.6%	[-4.2%, 12.7%]	33

Max RSS (memory usage)

Results (primary -3.7%, secondary 2.9%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.9%	[2.9%, 2.9%]	1
Improvements ✅ (primary)	-3.7%	[-5.4%, -2.6%]	3
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-3.7%	[-5.4%, -2.6%]	3

Cycles

Results (primary 2.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	4.5%	[3.3%, 7.4%]	13
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.8%	[-2.5%, -1.4%]	7
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	2.3%	[-2.5%, 7.4%]	20

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 759.636s -> 760.059s (0.06%) Artifact size: 337.10 MiB -> 337.13 MiB (0.01%)

Aug 09 '24 18:08 rust-timer

A small statistical observation:

bitmaps-3.1.0 benchmark (12.70%):

successful rejects in assemble_candidates_from_caller_bounds: 10/58 (~17%)
successful rejects in project: 1/1032 (~0.1%)

typenum-1.17.0 benchmark (9.59%):

successful rejects in assemble_candidates_from_caller_bounds: 7286/12526 (~58%)
successful rejects in project: 115/518 (~22%)

diesel-1.4.8 benchmark (-4.17%):

successful rejects in assemble_candidates_from_caller_bounds: 1405482/1558566 (~90%)
successful rejects in project: 159641/168826 (~95%)

std:

successful rejects in assemble_candidates_from_caller_bounds: 212125/285199 (~74%)
successful rejects in project: 5754/25544 (~22%)

We waste time more often in project and we can try to remove DeepRejectCtxt from there.

Aug 12 '24 11:08 Bryanskiy

the impact on diesel seems quite desirable however. I think we can improve the performance of fast reject significantly so that even the ~20% hit rate from typenum is a perf improvement.

Aug 12 '24 13:08 lcnr

@bors try @rust-timer queue

Aug 12 '24 13:08 lcnr

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

Aug 12 '24 13:08 rust-timer

:hourglass: Trying commit ea66e5bafed1ea059be520ba98b7cbbdf46d719e with merge bb463aad9cfa00799eb511d3c91c56c235661f6b...

Aug 12 '24 13:08 bors

ah also, the bitmaps regression is quite certainly not caused by slightly more than 1000 uses of deep reject. I expect that the worse performance of the DeepRejectCtxt in other use-sites (rejecting impls for trait goals) is the main cause of the regression

Aug 12 '24 13:08 lcnr

:sunny: Try build successful - checks-actions Build commit: bb463aad9cfa00799eb511d3c91c56c235661f6b (bb463aad9cfa00799eb511d3c91c56c235661f6b)

Aug 12 '24 15:08 bors

Queued bb463aad9cfa00799eb511d3c91c56c235661f6b with parent e08b80c0fb7667bdcd040761891701e576c42ec8, future comparison URL. There are currently 0 preceding artifacts in the queue. It will probably take at least ~1.2 hours until the benchmark run finishes.

Aug 12 '24 15:08 rust-timer

Finished benchmarking commit (bb463aad9cfa00799eb511d3c91c56c235661f6b): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never @rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.6%	[0.3%, 6.1%]	14
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.7%	[-4.0%, -0.3%]	16
Improvements ✅ (secondary)	-0.5%	[-0.5%, -0.4%]	3
All ❌✅ (primary)	0.8%	[-4.0%, 6.1%]	30

Max RSS (memory usage)

Results (primary 2.7%, secondary 0.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.7%	[2.7%, 2.7%]	1
Regressions ❌ (secondary)	2.2%	[1.6%, 2.8%]	2
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-3.6%	[-3.6%, -3.6%]	1
All ❌✅ (primary)	2.7%	[2.7%, 2.7%]	1

Cycles

Results (primary -1.8%, secondary 2.2%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.2%	[2.1%, 2.4%]	2
Improvements ✅ (primary)	-1.8%	[-2.3%, -1.4%]	7
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.8%	[-2.3%, -1.4%]	7

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 753.452s -> 753.337s (-0.02%) Artifact size: 341.39 MiB -> 341.43 MiB (0.01%)

Aug 12 '24 16:08 rust-timer

can you stash the commit adding all new uses (or open a separate PR with these new uses removed) to check the perf impact of the refactor of DeepRejectCtxt itself

Aug 12 '24 21:08 lcnr

ty::Bound, ty::FnDef, ty::Closure, ty::CoroutineClosure, ty::Coroutine have always been rejected previously. This is not the case anymore, as they can now be encountered in param_env. This is probably explains the regression when rejecting impls. We could try adding an extra field to DeepRejectCtxt that takes use-site into account.

Aug 13 '24 11:08 Bryanskiy

one option there is to outline all the very rare branches of the match into a #[cold] #[inline(never)] function

Aug 13 '24 13:08 lcnr

@rustbot author

Aug 14 '24 06:08 lcnr

results of local perf experiments

As a starting point i obtained perf impact caused only by refactoring changes in DeepRejectCtxt (same as https://github.com/rust-lang/rust/pull/129051, but only was running locally):

Range	Mean	Count
[-0.68%, 1.61%]	+0.84%	11

Next, i collected statistics on the frequency of branches and reordered them so that the most frequent ones went first. It is difficult to get any results here, as some branches depend on the order. For example, (Rigid, _) | (_, Rigid) => false is one of the most frequent case, but it should be the last. The best result that has been achieved:

Range	Mean	Count
[0.33%, 1.08%]	+0.61%	3

Next, i made DeepRejectCtxt generic with respect to the TreatParams:

Range	Mean	Count
[-1.43%, -0.61%]	-1.00%	12

Not sure why the ty::Error branch was returning true, but after removing it (moving to the ... => false in the end), the perf got better:

Range	Mean	Count
[ -1.83%, -0.70%]	-1.09%	12

Among the benchmarks, only bitmaps and typenum were used.

Aug 15 '24 15:08 Bryanskiy

Not sure why the ty::Error branch was returning true, but after removing it (moving to the ... => false in the end), the perf got better: Range Mean Count [ -1.83%, -0.70%] -1.09% 12

Among the benchmarks, only bitmaps and typenum were used.

This is why ty::Error returns true. Returning false means that the DeepRejectCtxt impacts behavior (even if only on the error path)

https://github.com/rust-lang/rust/blob/8fbdc04f1b2d6280c18f9e5458253bbe8e184a70/compiler/rustc_infer/src/infer/relate/type_relating.rs#L131-L134

Aug 16 '24 13:08 lcnr

@rustbot ready

Aug 19 '24 14:08 Bryanskiy

The job x86_64-gnu-llvm-17 failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

------
 > importing cache manifest from ghcr.io/rust-lang/rust-ci-cache:20d3b4d4a2629cbf7865cdbf92fe47512a7c96658c24253a045ff38e8075cd7fb37ca6fcadfa6e6d093333943ad24f6fc4f163ec5b74fd940de9d5bb03eb4d3b:
------
##[endgroup]
Setting extra environment values for docker:  --env ENABLE_GCC_CODEGEN=1 --env GCC_EXEC_PREFIX=/usr/lib/gcc/
[CI_JOB_NAME=x86_64-gnu-llvm-17]
---
sccache: Starting the server...
##[group]Configure the build
configure: processing command line
configure: 
configure: build.configure-args := ['--build=x86_64-unknown-linux-gnu', '--llvm-root=/usr/lib/llvm-17', '--enable-llvm-link-shared', '--set', 'rust.thin-lto-import-instr-limit=10', '--set', 'change-id=99999999', '--enable-verbose-configure', '--enable-sccache', '--disable-manage-submodules', '--enable-locked-deps', '--enable-cargo-native-static', '--set', 'rust.codegen-units-std=1', '--set', 'dist.compression-profile=balanced', '--dist-compression-formats=xz', '--set', 'rust.lld=false', '--disable-dist-src', '--release-channel=nightly', '--enable-debug-assertions', '--enable-overflow-checks', '--enable-llvm-assertions', '--set', 'rust.verify-llvm-ir', '--set', 'rust.codegen-backends=llvm,cranelift,gcc', '--set', 'llvm.static-libstdcpp', '--enable-new-symbol-mangling']
configure: target.x86_64-unknown-linux-gnu.llvm-config := /usr/lib/llvm-17/bin/llvm-config
configure: llvm.link-shared     := True
configure: rust.thin-lto-import-instr-limit := 10
configure: change-id            := 99999999
---
---- [ui] tests/ui/traits/vtable/issue-91807.rs stdout ----

error: test compilation failed although it shouldn't!
status: exit status: 1
command: env -u RUSTC_LOG_COLOR RUSTC_ICE="0" RUST_BACKTRACE="short" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "/checkout/tests/ui/traits/vtable/issue-91807.rs" "-Zthreads=1" "-Zsimulate-remapped-rust-src-base=/rustc/FAKE_PREFIX" "-Ztranslate-remapped-path-to-local-path=no" "-Z" "ignore-directory-in-diagnostics-source-blocks=/cargo" "-Z" "ignore-directory-in-diagnostics-source-blocks=/checkout/vendor" "--sysroot" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2" "--target=x86_64-unknown-linux-gnu" "--check-cfg" "cfg(FALSE)" "-C" "incremental=/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/traits/vtable/issue-91807/issue-91807.inc" "-Z" "incremental-verify-ich" "--error-format" "json" "--json" "future-incompat" "-Ccodegen-units=1" "-Zui-testing" "-Zdeduplicate-diagnostics=no" "-Zwrite-long-types-to-disk=no" "-Cstrip=debuginfo" "--emit" "metadata" "-C" "prefer-dynamic" "--out-dir" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/traits/vtable/issue-91807" "-A" "unused" "-A" "internal_features" "-Crpath" "-Cdebuginfo=0" "-Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/traits/vtable/issue-91807/auxiliary"
--- stderr -------------------------------
--- stderr -------------------------------
error[E0161]: cannot move a value of type `<dyn Fn(i32) as FnOnce<(i32,)>>::Output`
   |
LL |     f(0);
LL |     f(0);
   |     ^^^^ the size of `<dyn Fn(i32) as FnOnce<(i32,)>>::Output` cannot be statically determined
error: aborting due to 1 previous error

For more information about this error, try `rustc --explain E0161`.
------------------------------------------

Aug 19 '24 15:08 rust-log-analyzer

oops

@rustbot author

Aug 19 '24 15:08 Bryanskiy

@rustbot ready

Aug 19 '24 16:08 Bryanskiy

:umbrella: The latest upstream changes (presumably #128252) made this pull request unmergeable. Please resolve the merge conflicts.

Aug 20 '24 23:08 bors

I'd like to see another rebase + perf run before merging this. Will only get to reviewing it by the end of this week/start of the next

Aug 25 '24 13:08 lcnr

@bors try @rust-timer queue

Aug 25 '24 16:08 lqd

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

Aug 25 '24 16:08 rust-timer

:hourglass: Trying commit 2220d99f41b8b1029ab363ba43c3f72c36e9a037 with merge ca72c587d056bf0f39ef4d6ac03c45705c7924f4...

Aug 25 '24 16:08 bors