rust icon indicating copy to clipboard operation
rust copied to clipboard

Use `DeepRejectCtxt` to quickly reject `ParamEnv` candidates

Open Bryanskiy opened this issue 1 year ago • 32 comments

The description is on the zulip thread

r? @lcnr

Bryanskiy avatar Aug 07 '24 11:08 Bryanskiy

@bors try @rust-timer queue

lcnr avatar Aug 09 '24 14:08 lcnr

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

rust-timer avatar Aug 09 '24 14:08 rust-timer

:hourglass: Trying commit 77fc8536f35c5e4bbd7c0a6b304f8f2581dc1ea0 with merge 7a0b60e0de7dfef4ed18ad00b864a144909632d3...

bors avatar Aug 09 '24 14:08 bors

:sunny: Try build successful - checks-actions Build commit: 7a0b60e0de7dfef4ed18ad00b864a144909632d3 (7a0b60e0de7dfef4ed18ad00b864a144909632d3)

bors avatar Aug 09 '24 16:08 bors

Queued 7a0b60e0de7dfef4ed18ad00b864a144909632d3 with parent 899eb03926be23f2e5d2ffcaa1d6f9ac40af7f13, future comparison URL. There are currently 0 preceding artifacts in the queue. It will probably take at least ~1.2 hours until the benchmark run finishes.

rust-timer avatar Aug 09 '24 16:08 rust-timer

Finished benchmarking commit (7a0b60e0de7dfef4ed18ad00b864a144909632d3): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never @rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
6.3% [0.2%, 12.7%] 18
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.8% [-4.2%, -0.3%] 15
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.6% [-4.2%, 12.7%] 33

Max RSS (memory usage)

Results (primary -3.7%, secondary 2.9%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.9% [2.9%, 2.9%] 1
Improvements ✅
(primary)
-3.7% [-5.4%, -2.6%] 3
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -3.7% [-5.4%, -2.6%] 3

Cycles

Results (primary 2.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
4.5% [3.3%, 7.4%] 13
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.8% [-2.5%, -1.4%] 7
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.3% [-2.5%, 7.4%] 20

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 759.636s -> 760.059s (0.06%) Artifact size: 337.10 MiB -> 337.13 MiB (0.01%)

rust-timer avatar Aug 09 '24 18:08 rust-timer

A small statistical observation:

bitmaps-3.1.0 benchmark (12.70%):

  • successful rejects in assemble_candidates_from_caller_bounds: 10/58 (~17%)
  • successful rejects in project: 1/1032 (~0.1%)

typenum-1.17.0 benchmark (9.59%):

  • successful rejects in assemble_candidates_from_caller_bounds: 7286/12526 (~58%)
  • successful rejects in project: 115/518 (~22%)

diesel-1.4.8 benchmark (-4.17%):

  • successful rejects in assemble_candidates_from_caller_bounds: 1405482/1558566 (~90%)
  • successful rejects in project: 159641/168826 (~95%)

std:

  • successful rejects in assemble_candidates_from_caller_bounds: 212125/285199 (~74%)
  • successful rejects in project: 5754/25544 (~22%)

We waste time more often in project and we can try to remove DeepRejectCtxt from there.

Bryanskiy avatar Aug 12 '24 11:08 Bryanskiy

the impact on diesel seems quite desirable however. I think we can improve the performance of fast reject significantly so that even the ~20% hit rate from typenum is a perf improvement.

lcnr avatar Aug 12 '24 13:08 lcnr

@bors try @rust-timer queue

lcnr avatar Aug 12 '24 13:08 lcnr

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

rust-timer avatar Aug 12 '24 13:08 rust-timer

:hourglass: Trying commit ea66e5bafed1ea059be520ba98b7cbbdf46d719e with merge bb463aad9cfa00799eb511d3c91c56c235661f6b...

bors avatar Aug 12 '24 13:08 bors

ah also, the bitmaps regression is quite certainly not caused by slightly more than 1000 uses of deep reject. I expect that the worse performance of the DeepRejectCtxt in other use-sites (rejecting impls for trait goals) is the main cause of the regression

lcnr avatar Aug 12 '24 13:08 lcnr

:sunny: Try build successful - checks-actions Build commit: bb463aad9cfa00799eb511d3c91c56c235661f6b (bb463aad9cfa00799eb511d3c91c56c235661f6b)

bors avatar Aug 12 '24 15:08 bors

Queued bb463aad9cfa00799eb511d3c91c56c235661f6b with parent e08b80c0fb7667bdcd040761891701e576c42ec8, future comparison URL. There are currently 0 preceding artifacts in the queue. It will probably take at least ~1.2 hours until the benchmark run finishes.

rust-timer avatar Aug 12 '24 15:08 rust-timer

Finished benchmarking commit (bb463aad9cfa00799eb511d3c91c56c235661f6b): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never @rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.6% [0.3%, 6.1%] 14
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.7% [-4.0%, -0.3%] 16
Improvements ✅
(secondary)
-0.5% [-0.5%, -0.4%] 3
All ❌✅ (primary) 0.8% [-4.0%, 6.1%] 30

Max RSS (memory usage)

Results (primary 2.7%, secondary 0.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.7% [2.7%, 2.7%] 1
Regressions ❌
(secondary)
2.2% [1.6%, 2.8%] 2
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-3.6% [-3.6%, -3.6%] 1
All ❌✅ (primary) 2.7% [2.7%, 2.7%] 1

Cycles

Results (primary -1.8%, secondary 2.2%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.2% [2.1%, 2.4%] 2
Improvements ✅
(primary)
-1.8% [-2.3%, -1.4%] 7
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.8% [-2.3%, -1.4%] 7

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 753.452s -> 753.337s (-0.02%) Artifact size: 341.39 MiB -> 341.43 MiB (0.01%)

rust-timer avatar Aug 12 '24 16:08 rust-timer

can you stash the commit adding all new uses (or open a separate PR with these new uses removed) to check the perf impact of the refactor of DeepRejectCtxt itself

lcnr avatar Aug 12 '24 21:08 lcnr

ty::Bound, ty::FnDef, ty::Closure, ty::CoroutineClosure, ty::Coroutine have always been rejected previously. This is not the case anymore, as they can now be encountered in param_env. This is probably explains the regression when rejecting impls. We could try adding an extra field to DeepRejectCtxt that takes use-site into account.

Bryanskiy avatar Aug 13 '24 11:08 Bryanskiy

one option there is to outline all the very rare branches of the match into a #[cold] #[inline(never)] function

lcnr avatar Aug 13 '24 13:08 lcnr

@rustbot author

lcnr avatar Aug 14 '24 06:08 lcnr

results of local perf experiments

As a starting point i obtained perf impact caused only by refactoring changes in DeepRejectCtxt (same as https://github.com/rust-lang/rust/pull/129051, but only was running locally):

Range Mean Count
[-0.68%, 1.61%] +0.84% 11

Next, i collected statistics on the frequency of branches and reordered them so that the most frequent ones went first. It is difficult to get any results here, as some branches depend on the order. For example, (Rigid, _) | (_, Rigid) => false is one of the most frequent case, but it should be the last. The best result that has been achieved:

Range Mean Count
[0.33%, 1.08%] +0.61% 3

Next, i made DeepRejectCtxt generic with respect to the TreatParams:

Range Mean Count
[-1.43%, -0.61%] -1.00% 12

Not sure why the ty::Error branch was returning true, but after removing it (moving to the ... => false in the end), the perf got better:

Range Mean Count
[ -1.83%,  -0.70%] -1.09% 12

Among the benchmarks, only bitmaps and typenum were used.

Bryanskiy avatar Aug 15 '24 15:08 Bryanskiy

Not sure why the ty::Error branch was returning true, but after removing it (moving to the ... => false in the end), the perf got better: Range Mean Count [ -1.83%, -0.70%] -1.09% 12

Among the benchmarks, only bitmaps and typenum were used.

This is why ty::Error returns true. Returning false means that the DeepRejectCtxt impacts behavior (even if only on the error path)

https://github.com/rust-lang/rust/blob/8fbdc04f1b2d6280c18f9e5458253bbe8e184a70/compiler/rustc_infer/src/infer/relate/type_relating.rs#L131-L134

lcnr avatar Aug 16 '24 13:08 lcnr

@rustbot ready

Bryanskiy avatar Aug 19 '24 14:08 Bryanskiy

The job x86_64-gnu-llvm-17 failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
------
 > importing cache manifest from ghcr.io/rust-lang/rust-ci-cache:20d3b4d4a2629cbf7865cdbf92fe47512a7c96658c24253a045ff38e8075cd7fb37ca6fcadfa6e6d093333943ad24f6fc4f163ec5b74fd940de9d5bb03eb4d3b:
------
##[endgroup]
Setting extra environment values for docker:  --env ENABLE_GCC_CODEGEN=1 --env GCC_EXEC_PREFIX=/usr/lib/gcc/
[CI_JOB_NAME=x86_64-gnu-llvm-17]
---
sccache: Starting the server...
##[group]Configure the build
configure: processing command line
configure: 
configure: build.configure-args := ['--build=x86_64-unknown-linux-gnu', '--llvm-root=/usr/lib/llvm-17', '--enable-llvm-link-shared', '--set', 'rust.thin-lto-import-instr-limit=10', '--set', 'change-id=99999999', '--enable-verbose-configure', '--enable-sccache', '--disable-manage-submodules', '--enable-locked-deps', '--enable-cargo-native-static', '--set', 'rust.codegen-units-std=1', '--set', 'dist.compression-profile=balanced', '--dist-compression-formats=xz', '--set', 'rust.lld=false', '--disable-dist-src', '--release-channel=nightly', '--enable-debug-assertions', '--enable-overflow-checks', '--enable-llvm-assertions', '--set', 'rust.verify-llvm-ir', '--set', 'rust.codegen-backends=llvm,cranelift,gcc', '--set', 'llvm.static-libstdcpp', '--enable-new-symbol-mangling']
configure: target.x86_64-unknown-linux-gnu.llvm-config := /usr/lib/llvm-17/bin/llvm-config
configure: llvm.link-shared     := True
configure: rust.thin-lto-import-instr-limit := 10
configure: change-id            := 99999999
---
---- [ui] tests/ui/traits/vtable/issue-91807.rs stdout ----

error: test compilation failed although it shouldn't!
status: exit status: 1
command: env -u RUSTC_LOG_COLOR RUSTC_ICE="0" RUST_BACKTRACE="short" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "/checkout/tests/ui/traits/vtable/issue-91807.rs" "-Zthreads=1" "-Zsimulate-remapped-rust-src-base=/rustc/FAKE_PREFIX" "-Ztranslate-remapped-path-to-local-path=no" "-Z" "ignore-directory-in-diagnostics-source-blocks=/cargo" "-Z" "ignore-directory-in-diagnostics-source-blocks=/checkout/vendor" "--sysroot" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2" "--target=x86_64-unknown-linux-gnu" "--check-cfg" "cfg(FALSE)" "-C" "incremental=/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/traits/vtable/issue-91807/issue-91807.inc" "-Z" "incremental-verify-ich" "--error-format" "json" "--json" "future-incompat" "-Ccodegen-units=1" "-Zui-testing" "-Zdeduplicate-diagnostics=no" "-Zwrite-long-types-to-disk=no" "-Cstrip=debuginfo" "--emit" "metadata" "-C" "prefer-dynamic" "--out-dir" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/traits/vtable/issue-91807" "-A" "unused" "-A" "internal_features" "-Crpath" "-Cdebuginfo=0" "-Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/traits/vtable/issue-91807/auxiliary"
--- stderr -------------------------------
--- stderr -------------------------------
error[E0161]: cannot move a value of type `<dyn Fn(i32) as FnOnce<(i32,)>>::Output`
   |
LL |     f(0);
LL |     f(0);
   |     ^^^^ the size of `<dyn Fn(i32) as FnOnce<(i32,)>>::Output` cannot be statically determined
error: aborting due to 1 previous error

For more information about this error, try `rustc --explain E0161`.
------------------------------------------

rust-log-analyzer avatar Aug 19 '24 15:08 rust-log-analyzer

oops

@rustbot author

Bryanskiy avatar Aug 19 '24 15:08 Bryanskiy

@rustbot ready

Bryanskiy avatar Aug 19 '24 16:08 Bryanskiy

:umbrella: The latest upstream changes (presumably #128252) made this pull request unmergeable. Please resolve the merge conflicts.

bors avatar Aug 20 '24 23:08 bors

I'd like to see another rebase + perf run before merging this. Will only get to reviewing it by the end of this week/start of the next

lcnr avatar Aug 25 '24 13:08 lcnr

@bors try @rust-timer queue

lqd avatar Aug 25 '24 16:08 lqd

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

rust-timer avatar Aug 25 '24 16:08 rust-timer

:hourglass: Trying commit 2220d99f41b8b1029ab363ba43c3f72c36e9a037 with merge ca72c587d056bf0f39ef4d6ac03c45705c7924f4...

bors avatar Aug 25 '24 16:08 bors