runtime
runtime copied to clipboard
JIT: Visit switch successors in increasing likelihood order for RPO-based layout
Follow-up to #101473. Also cleans up the BasicBlock
successor visitor API surface a bit by separating logic for visiting successors in increasing likelihood order into BasicBlock::VisitAllSuccsInLikelihoodOrder
.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.
Here's the diff summary on win-x64, since the RPO-based layout is disabled by default. Diffs aren't that big, and they aren't overwhelmingly in one direction...
Diffs are based on 2,534,676 contexts (987,849 MinOpts, 1,546,827 FullOpts).
MISSED contexts: 2,922 (0.12%)
Overall (-8,298 bytes)
Collection | Base size (bytes) | Diff size (bytes) | PerfScore in Diffs |
---|---|---|---|
aspnet.run.windows.x64.checked.mch | 39,855,926 | +1,073 | +0.20% |
benchmarks.run.windows.x64.checked.mch | 8,710,150 | -108 | -0.05% |
benchmarks.run_pgo.windows.x64.checked.mch | 32,332,197 | -718 | +0.39% |
benchmarks.run_tiered.windows.x64.checked.mch | 12,340,101 | -23 | +0.03% |
coreclr_tests.run.windows.x64.checked.mch | 402,921,293 | -325 | +0.27% |
libraries.crossgen2.windows.x64.checked.mch | 45,252,213 | -1,591 | -0.13% |
libraries.pmi.windows.x64.checked.mch | 63,617,634 | -575 | -0.11% |
libraries_tests.run.windows.x64.Release.mch | 285,052,618 | -2,971 | +0.79% |
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch | 137,049,028 | -574 | -0.14% |
realworld.run.windows.x64.checked.mch | 13,552,182 | -1,625 | -0.08% |
smoke_tests.nativeaot.windows.x64.checked.mch | 5,032,760 | -861 | -1.60% |
FullOpts (-8,298 bytes)
Collection | Base size (bytes) | Diff size (bytes) | PerfScore in Diffs |
---|---|---|---|
aspnet.run.windows.x64.checked.mch | 27,307,698 | +1,073 | +0.20% |
benchmarks.run.windows.x64.checked.mch | 8,709,728 | -108 | -0.05% |
benchmarks.run_pgo.windows.x64.checked.mch | 18,519,682 | -718 | +0.39% |
benchmarks.run_tiered.windows.x64.checked.mch | 3,077,579 | -23 | +0.03% |
coreclr_tests.run.windows.x64.checked.mch | 121,562,176 | -325 | +0.27% |
libraries.crossgen2.windows.x64.checked.mch | 45,250,508 | -1,591 | -0.13% |
libraries.pmi.windows.x64.checked.mch | 63,504,133 | -575 | -0.11% |
libraries_tests.run.windows.x64.Release.mch | 107,104,520 | -2,971 | +0.79% |
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch | 126,736,473 | -574 | -0.14% |
realworld.run.windows.x64.checked.mch | 13,146,461 | -1,625 | -0.08% |
smoke_tests.nativeaot.windows.x64.checked.mch | 5,031,713 | -861 | -1.60% |
cc @dotnet/jit-contrib, @AndyAyersMS PTAL. If you don't think it's worth churning the RPO layout implementation while running an experiment in the perf lab (especially since the diffs don't look like particularly large wins), I'm happy to undo those changes and just keep the block visitor cleanup work.
@jakobbotsch this cleans up the BasicBlock
visitor API surface a bit. I initially tried something like the initializer pattern you mentioned over in #101473, but the current usage of AllSuccessorEnumerator
in fgRunDfs
complicated this approach -- what do you think of just passing the useProfile
template argument in fgRunDfs
to AllSuccessorEnumerator
? It introduces some code duplication to the latter's constructor, but it's otherwise a pretty localized change.