Use `DeepRejectCtxt` to quickly reject `ParamEnv` candidates
@bors try @rust-timer queue
Awaiting bors try build completion.
@rustbot label: +S-waiting-on-perf
:hourglass: Trying commit 77fc8536f35c5e4bbd7c0a6b304f8f2581dc1ea0 with merge 7a0b60e0de7dfef4ed18ad00b864a144909632d3...
:sunny: Try build successful - checks-actions
Build commit: 7a0b60e0de7dfef4ed18ad00b864a144909632d3 (7a0b60e0de7dfef4ed18ad00b864a144909632d3)
Queued 7a0b60e0de7dfef4ed18ad00b864a144909632d3 with parent 899eb03926be23f2e5d2ffcaa1d6f9ac40af7f13, future comparison URL. There are currently 0 preceding artifacts in the queue. It will probably take at least ~1.2 hours until the benchmark run finishes.
Finished benchmarking commit (7a0b60e0de7dfef4ed18ad00b864a144909632d3): comparison URL.
Overall result: ❌✅ regressions and improvements - ACTION NEEDED
Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.
Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.
@bors rollup=never @rustbot label: -S-waiting-on-perf +perf-regression
Instruction count
This is a highly reliable metric that was used to determine the overall result at the top of this comment.
| mean | range | count | |
|---|---|---|---|
| Regressions ❌ (primary) |
6.3% | [0.2%, 12.7%] | 18 |
| Regressions ❌ (secondary) |
- | - | 0 |
| Improvements ✅ (primary) |
-1.8% | [-4.2%, -0.3%] | 15 |
| Improvements ✅ (secondary) |
- | - | 0 |
| All ❌✅ (primary) | 2.6% | [-4.2%, 12.7%] | 33 |
Max RSS (memory usage)
Results (primary -3.7%, secondary 2.9%)
This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
| mean | range | count | |
|---|---|---|---|
| Regressions ❌ (primary) |
- | - | 0 |
| Regressions ❌ (secondary) |
2.9% | [2.9%, 2.9%] | 1 |
| Improvements ✅ (primary) |
-3.7% | [-5.4%, -2.6%] | 3 |
| Improvements ✅ (secondary) |
- | - | 0 |
| All ❌✅ (primary) | -3.7% | [-5.4%, -2.6%] | 3 |
Cycles
Results (primary 2.3%)
This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
| mean | range | count | |
|---|---|---|---|
| Regressions ❌ (primary) |
4.5% | [3.3%, 7.4%] | 13 |
| Regressions ❌ (secondary) |
- | - | 0 |
| Improvements ✅ (primary) |
-1.8% | [-2.5%, -1.4%] | 7 |
| Improvements ✅ (secondary) |
- | - | 0 |
| All ❌✅ (primary) | 2.3% | [-2.5%, 7.4%] | 20 |
Binary size
This benchmark run did not return any relevant results for this metric.
Bootstrap: 759.636s -> 760.059s (0.06%) Artifact size: 337.10 MiB -> 337.13 MiB (0.01%)
A small statistical observation:
bitmaps-3.1.0 benchmark (12.70%):
- successful rejects in
assemble_candidates_from_caller_bounds: 10/58 (~17%) - successful rejects in
project: 1/1032 (~0.1%)
typenum-1.17.0 benchmark (9.59%):
- successful rejects in
assemble_candidates_from_caller_bounds: 7286/12526 (~58%) - successful rejects in
project: 115/518 (~22%)
diesel-1.4.8 benchmark (-4.17%):
- successful rejects in
assemble_candidates_from_caller_bounds: 1405482/1558566 (~90%) - successful rejects in
project: 159641/168826 (~95%)
std:
- successful rejects in
assemble_candidates_from_caller_bounds: 212125/285199 (~74%) - successful rejects in
project: 5754/25544 (~22%)
We waste time more often in project and we can try to remove DeepRejectCtxt from there.
the impact on diesel seems quite desirable however. I think we can improve the performance of fast reject significantly so that even the ~20% hit rate from typenum is a perf improvement.
@bors try @rust-timer queue
Awaiting bors try build completion.
@rustbot label: +S-waiting-on-perf
:hourglass: Trying commit ea66e5bafed1ea059be520ba98b7cbbdf46d719e with merge bb463aad9cfa00799eb511d3c91c56c235661f6b...
ah also, the bitmaps regression is quite certainly not caused by slightly more than 1000 uses of deep reject. I expect that the worse performance of the DeepRejectCtxt in other use-sites (rejecting impls for trait goals) is the main cause of the regression
:sunny: Try build successful - checks-actions
Build commit: bb463aad9cfa00799eb511d3c91c56c235661f6b (bb463aad9cfa00799eb511d3c91c56c235661f6b)
Queued bb463aad9cfa00799eb511d3c91c56c235661f6b with parent e08b80c0fb7667bdcd040761891701e576c42ec8, future comparison URL. There are currently 0 preceding artifacts in the queue. It will probably take at least ~1.2 hours until the benchmark run finishes.
Finished benchmarking commit (bb463aad9cfa00799eb511d3c91c56c235661f6b): comparison URL.
Overall result: ❌✅ regressions and improvements - ACTION NEEDED
Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.
Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.
@bors rollup=never @rustbot label: -S-waiting-on-perf +perf-regression
Instruction count
This is a highly reliable metric that was used to determine the overall result at the top of this comment.
| mean | range | count | |
|---|---|---|---|
| Regressions ❌ (primary) |
3.6% | [0.3%, 6.1%] | 14 |
| Regressions ❌ (secondary) |
- | - | 0 |
| Improvements ✅ (primary) |
-1.7% | [-4.0%, -0.3%] | 16 |
| Improvements ✅ (secondary) |
-0.5% | [-0.5%, -0.4%] | 3 |
| All ❌✅ (primary) | 0.8% | [-4.0%, 6.1%] | 30 |
Max RSS (memory usage)
Results (primary 2.7%, secondary 0.3%)
This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
| mean | range | count | |
|---|---|---|---|
| Regressions ❌ (primary) |
2.7% | [2.7%, 2.7%] | 1 |
| Regressions ❌ (secondary) |
2.2% | [1.6%, 2.8%] | 2 |
| Improvements ✅ (primary) |
- | - | 0 |
| Improvements ✅ (secondary) |
-3.6% | [-3.6%, -3.6%] | 1 |
| All ❌✅ (primary) | 2.7% | [2.7%, 2.7%] | 1 |
Cycles
Results (primary -1.8%, secondary 2.2%)
This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
| mean | range | count | |
|---|---|---|---|
| Regressions ❌ (primary) |
- | - | 0 |
| Regressions ❌ (secondary) |
2.2% | [2.1%, 2.4%] | 2 |
| Improvements ✅ (primary) |
-1.8% | [-2.3%, -1.4%] | 7 |
| Improvements ✅ (secondary) |
- | - | 0 |
| All ❌✅ (primary) | -1.8% | [-2.3%, -1.4%] | 7 |
Binary size
This benchmark run did not return any relevant results for this metric.
Bootstrap: 753.452s -> 753.337s (-0.02%) Artifact size: 341.39 MiB -> 341.43 MiB (0.01%)
can you stash the commit adding all new uses (or open a separate PR with these new uses removed) to check the perf impact of the refactor of DeepRejectCtxt itself
ty::Bound, ty::FnDef, ty::Closure, ty::CoroutineClosure, ty::Coroutine have always been rejected previously. This is not the case anymore, as they can now be encountered in param_env. This is probably explains the regression when rejecting impls. We could try adding an extra field to DeepRejectCtxt that takes use-site into account.
one option there is to outline all the very rare branches of the match into a #[cold] #[inline(never)] function
@rustbot author
results of local perf experiments
As a starting point i obtained perf impact caused only by refactoring changes in DeepRejectCtxt (same as https://github.com/rust-lang/rust/pull/129051, but only was running locally):
| Range | Mean | Count |
|---|---|---|
| [-0.68%, 1.61%] | +0.84% | 11 |
Next, i collected statistics on the frequency of branches and reordered them so that the most frequent ones went first. It is difficult to get any results here, as some branches depend on the order. For example, (Rigid, _) | (_, Rigid) => false is one of the most frequent case, but it should be the last. The best result that has been achieved:
| Range | Mean | Count |
|---|---|---|
| [0.33%, 1.08%] | +0.61% | 3 |
Next, i made DeepRejectCtxt generic with respect to the TreatParams:
| Range | Mean | Count |
|---|---|---|
| [-1.43%, -0.61%] | -1.00% | 12 |
Not sure why the ty::Error branch was returning true, but after removing it (moving to the ... => false in the end), the perf got better:
| Range | Mean | Count |
|---|---|---|
| [ -1.83%, -0.70%] | -1.09% | 12 |
Among the benchmarks, only bitmaps and typenum were used.
Not sure why the
ty::Errorbranch was returning true, but after removing it (moving to the... => falsein the end), the perf got better: Range Mean Count [ -1.83%, -0.70%] -1.09% 12Among the benchmarks, only
bitmapsandtypenumwere used.
This is why ty::Error returns true. Returning false means that the DeepRejectCtxt impacts behavior (even if only on the error path)
https://github.com/rust-lang/rust/blob/8fbdc04f1b2d6280c18f9e5458253bbe8e184a70/compiler/rustc_infer/src/infer/relate/type_relating.rs#L131-L134
@rustbot ready
The job x86_64-gnu-llvm-17 failed! Check out the build log: (web) (plain)
Click to see the possible cause of the failure (guessed by this bot)
------
> importing cache manifest from ghcr.io/rust-lang/rust-ci-cache:20d3b4d4a2629cbf7865cdbf92fe47512a7c96658c24253a045ff38e8075cd7fb37ca6fcadfa6e6d093333943ad24f6fc4f163ec5b74fd940de9d5bb03eb4d3b:
------
##[endgroup]
Setting extra environment values for docker: --env ENABLE_GCC_CODEGEN=1 --env GCC_EXEC_PREFIX=/usr/lib/gcc/
[CI_JOB_NAME=x86_64-gnu-llvm-17]
---
sccache: Starting the server...
##[group]Configure the build
configure: processing command line
configure:
configure: build.configure-args := ['--build=x86_64-unknown-linux-gnu', '--llvm-root=/usr/lib/llvm-17', '--enable-llvm-link-shared', '--set', 'rust.thin-lto-import-instr-limit=10', '--set', 'change-id=99999999', '--enable-verbose-configure', '--enable-sccache', '--disable-manage-submodules', '--enable-locked-deps', '--enable-cargo-native-static', '--set', 'rust.codegen-units-std=1', '--set', 'dist.compression-profile=balanced', '--dist-compression-formats=xz', '--set', 'rust.lld=false', '--disable-dist-src', '--release-channel=nightly', '--enable-debug-assertions', '--enable-overflow-checks', '--enable-llvm-assertions', '--set', 'rust.verify-llvm-ir', '--set', 'rust.codegen-backends=llvm,cranelift,gcc', '--set', 'llvm.static-libstdcpp', '--enable-new-symbol-mangling']
configure: target.x86_64-unknown-linux-gnu.llvm-config := /usr/lib/llvm-17/bin/llvm-config
configure: llvm.link-shared := True
configure: rust.thin-lto-import-instr-limit := 10
configure: change-id := 99999999
---
---- [ui] tests/ui/traits/vtable/issue-91807.rs stdout ----
error: test compilation failed although it shouldn't!
status: exit status: 1
command: env -u RUSTC_LOG_COLOR RUSTC_ICE="0" RUST_BACKTRACE="short" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "/checkout/tests/ui/traits/vtable/issue-91807.rs" "-Zthreads=1" "-Zsimulate-remapped-rust-src-base=/rustc/FAKE_PREFIX" "-Ztranslate-remapped-path-to-local-path=no" "-Z" "ignore-directory-in-diagnostics-source-blocks=/cargo" "-Z" "ignore-directory-in-diagnostics-source-blocks=/checkout/vendor" "--sysroot" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2" "--target=x86_64-unknown-linux-gnu" "--check-cfg" "cfg(FALSE)" "-C" "incremental=/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/traits/vtable/issue-91807/issue-91807.inc" "-Z" "incremental-verify-ich" "--error-format" "json" "--json" "future-incompat" "-Ccodegen-units=1" "-Zui-testing" "-Zdeduplicate-diagnostics=no" "-Zwrite-long-types-to-disk=no" "-Cstrip=debuginfo" "--emit" "metadata" "-C" "prefer-dynamic" "--out-dir" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/traits/vtable/issue-91807" "-A" "unused" "-A" "internal_features" "-Crpath" "-Cdebuginfo=0" "-Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/traits/vtable/issue-91807/auxiliary"
--- stderr -------------------------------
--- stderr -------------------------------
error[E0161]: cannot move a value of type `<dyn Fn(i32) as FnOnce<(i32,)>>::Output`
|
LL | f(0);
LL | f(0);
| ^^^^ the size of `<dyn Fn(i32) as FnOnce<(i32,)>>::Output` cannot be statically determined
error: aborting due to 1 previous error
For more information about this error, try `rustc --explain E0161`.
------------------------------------------
oops
@rustbot author
@rustbot ready
:umbrella: The latest upstream changes (presumably #128252) made this pull request unmergeable. Please resolve the merge conflicts.
I'd like to see another rebase + perf run before merging this. Will only get to reviewing it by the end of this week/start of the next
@bors try @rust-timer queue
Awaiting bors try build completion.
@rustbot label: +S-waiting-on-perf
:hourglass: Trying commit 2220d99f41b8b1029ab363ba43c3f72c36e9a037 with merge ca72c587d056bf0f39ef4d6ac03c45705c7924f4...