runtime
runtime copied to clipboard
JIT: update block weight for uncond to cond flow opt
This optimization duplicates code and flow in a BBJ_COND successor into one of its preds; as a result the weight of the successor should decrease.
Fixes some issues seen with odd perf scores in the ML/CSE experiment.
Contributes to #93020
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.
Issue Details
This optimization duplicates code and flow in a BBJ_COND successor into one of its preds; as a result the weight of the successor should decrease.
Fixes some issues seen with odd perf scores in the ML/CSE experiment.
Contributes to #93020
Author: | AndyAyersMS |
---|---|
Assignees: | - |
Labels: |
|
Milestone: | - |
FYI @dotnet/jit-contrib
A few large local regressions. The ones I looked at were all additional cloning in loops with type tests.
Diff results for #98324
Assembly diffs
Assembly diffs for linux/arm64 ran on windows/x64
Diffs are based on 2,520,572 contexts (999,218 MinOpts, 1,521,354 FullOpts).
MISSED contexts: base: 4 (0.00%), diff: 97 (0.00%)
Overall (+366,484 bytes)
Collection | Base size (bytes) | Diff size (bytes) |
---|---|---|
benchmarks.run_pgo.linux.arm64.checked.mch | 74,854,976 | -16,084 |
coreclr_tests.run.linux.arm64.checked.mch | 509,287,460 | +21,008 |
libraries.pmi.linux.arm64.checked.mch | 76,692,260 | +136 |
libraries_tests.run.linux.arm64.Release.mch | 380,944,972 | +363,948 |
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch | 164,707,488 | -2,524 |
FullOpts (+366,484 bytes)
Collection | Base size (bytes) | Diff size (bytes) |
---|---|---|
benchmarks.run_pgo.linux.arm64.checked.mch | 52,852,636 | -16,084 |
coreclr_tests.run.linux.arm64.checked.mch | 160,417,140 | +21,008 |
libraries.pmi.linux.arm64.checked.mch | 76,572,276 | +136 |
libraries_tests.run.linux.arm64.Release.mch | 166,327,572 | +363,948 |
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch | 151,304,588 | -2,524 |
Assembly diffs for linux/x64 ran on windows/x64
Diffs are based on 2,542,702 contexts (985,624 MinOpts, 1,557,078 FullOpts).
MISSED contexts: base: 0 (0.00%), diff: 95 (0.00%)
Overall (+420,577 bytes)
Collection | Base size (bytes) | Diff size (bytes) |
---|---|---|
benchmarks.run_pgo.linux.x64.checked.mch | 69,699,439 | -55,836 |
coreclr_tests.run.linux.x64.checked.mch | 403,413,227 | +19,477 |
libraries.pmi.linux.x64.checked.mch | 60,773,002 | +2,467 |
libraries_tests.run.linux.x64.Release.mch | 336,651,500 | +446,207 |
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch | 132,477,813 | +8,262 |
FullOpts (+420,577 bytes)
Collection | Base size (bytes) | Diff size (bytes) |
---|---|---|
benchmarks.run_pgo.linux.x64.checked.mch | 46,789,254 | -55,836 |
coreclr_tests.run.linux.x64.checked.mch | 123,798,121 | +19,477 |
libraries.pmi.linux.x64.checked.mch | 60,660,145 | +2,467 |
libraries_tests.run.linux.x64.Release.mch | 153,474,826 | +446,207 |
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch | 121,893,760 | +8,262 |
Assembly diffs for osx/arm64 ran on windows/x64
Diffs are based on 2,262,128 contexts (921,087 MinOpts, 1,341,041 FullOpts).
MISSED contexts: base: 3 (0.00%), diff: 77 (0.00%)
Overall (+202,992 bytes)
Collection | Base size (bytes) | Diff size (bytes) |
---|---|---|
benchmarks.run_pgo.osx.arm64.checked.mch | 24,713,208 | -9,188 |
coreclr_tests.run.osx.arm64.checked.mch | 476,227,036 | +16,244 |
libraries_tests.run.osx.arm64.Release.mch | 312,891,008 | +199,048 |
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch | 162,507,756 | -3,112 |
FullOpts (+202,992 bytes)
Collection | Base size (bytes) | Diff size (bytes) |
---|---|---|
benchmarks.run_pgo.osx.arm64.checked.mch | 8,957,196 | -9,188 |
coreclr_tests.run.osx.arm64.checked.mch | 150,934,824 | +16,244 |
libraries_tests.run.osx.arm64.Release.mch | 111,510,444 | +199,048 |
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch | 149,448,344 | -3,112 |
Assembly diffs for windows/arm64 ran on windows/x64
Diffs are based on 2,368,064 contexts (937,277 MinOpts, 1,430,787 FullOpts).
MISSED contexts: base: 0 (0.00%), diff: 84 (0.00%)
Overall (+305,988 bytes)
Collection | Base size (bytes) | Diff size (bytes) |
---|---|---|
benchmarks.run_pgo.windows.arm64.checked.mch | 46,686,644 | +1,196 |
coreclr_tests.run.windows.arm64.checked.mch | 496,328,272 | +21,196 |
libraries.pmi.windows.arm64.checked.mch | 80,267,692 | +828 |
libraries_tests.run.windows.arm64.Release.mch | 323,036,384 | +281,288 |
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch | 171,257,132 | +1,480 |
FullOpts (+305,988 bytes)
Collection | Base size (bytes) | Diff size (bytes) |
---|---|---|
benchmarks.run_pgo.windows.arm64.checked.mch | 30,351,036 | +1,196 |
coreclr_tests.run.windows.arm64.checked.mch | 156,850,884 | +21,196 |
libraries.pmi.windows.arm64.checked.mch | 80,147,708 | +828 |
libraries_tests.run.windows.arm64.Release.mch | 119,164,376 | +281,288 |
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch | 158,197,788 | +1,480 |
Assembly diffs for windows/x64 ran on windows/x64
Diffs are based on 2,908,360 contexts (1,240,334 MinOpts, 1,668,026 FullOpts).
MISSED contexts: base: 133 (0.00%), diff: 223 (0.01%)
Overall (+239,961 bytes)
Collection | Base size (bytes) | Diff size (bytes) |
---|---|---|
aspnet.run.windows.x64.checked.mch | 46,759,029 | +26,975 |
benchmarks.run_pgo.windows.x64.checked.mch | 45,780,921 | -82,475 |
coreclr_tests.run.windows.x64.checked.mch | 464,618,729 | +19,542 |
libraries.pmi.windows.x64.checked.mch | 64,172,055 | +3,231 |
libraries_tests.run.windows.x64.Release.mch | 309,629,720 | +259,697 |
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch | 149,849,548 | +12,991 |
FullOpts (+239,961 bytes)
Collection | Base size (bytes) | Diff size (bytes) |
---|---|---|
aspnet.run.windows.x64.checked.mch | 28,268,214 | +26,975 |
benchmarks.run_pgo.windows.x64.checked.mch | 23,454,275 | -82,475 |
coreclr_tests.run.windows.x64.checked.mch | 130,697,415 | +19,542 |
libraries.pmi.windows.x64.checked.mch | 64,058,534 | +3,231 |
libraries_tests.run.windows.x64.Release.mch | 110,763,662 | +259,697 |
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch | 138,621,801 | +12,991 |
Details here
Throughput diffs
Throughput diffs for linux/arm64 ran on windows/x64
Overall (-0.01% to +0.18%)
Collection | PDIFF |
---|---|
benchmarks.run_pgo.linux.arm64.checked.mch | -0.01% |
coreclr_tests.run.linux.arm64.checked.mch | +0.01% |
libraries_tests.run.linux.arm64.Release.mch | +0.18% |
MinOpts (-0.00% to +0.01%)
Collection | PDIFF |
---|---|
libraries.pmi.linux.arm64.checked.mch | +0.01% |
FullOpts (-0.01% to +0.24%)
Collection | PDIFF |
---|---|
benchmarks.run_pgo.linux.arm64.checked.mch | -0.01% |
coreclr_tests.run.linux.arm64.checked.mch | +0.02% |
libraries_tests.run.linux.arm64.Release.mch | +0.24% |
Throughput diffs for linux/x64 ran on windows/x64
Overall (-0.02% to +0.21%)
Collection | PDIFF |
---|---|
benchmarks.run_pgo.linux.x64.checked.mch | -0.02% |
coreclr_tests.run.linux.x64.checked.mch | +0.01% |
libraries_tests.run.linux.x64.Release.mch | +0.21% |
FullOpts (-0.02% to +0.26%)
Collection | PDIFF |
---|---|
benchmarks.run_pgo.linux.x64.checked.mch | -0.02% |
coreclr_tests.run.linux.x64.checked.mch | +0.01% |
libraries_tests.run.linux.x64.Release.mch | +0.26% |
Throughput diffs for osx/arm64 ran on windows/x64
Overall (-0.01% to +0.14%)
Collection | PDIFF |
---|---|
benchmarks.run_pgo.osx.arm64.checked.mch | -0.01% |
coreclr_tests.run.osx.arm64.checked.mch | +0.01% |
libraries_tests.run.osx.arm64.Release.mch | +0.14% |
FullOpts (-0.01% to +0.21%)
Collection | PDIFF |
---|---|
benchmarks.run_pgo.osx.arm64.checked.mch | -0.01% |
coreclr_tests.run.osx.arm64.checked.mch | +0.01% |
libraries_tests.run.osx.arm64.Release.mch | +0.21% |
Throughput diffs for windows/arm64 ran on windows/x64
Overall (-0.00% to +0.19%)
Collection | PDIFF |
---|---|
coreclr_tests.run.windows.arm64.checked.mch | +0.01% |
libraries_tests.run.windows.arm64.Release.mch | +0.19% |
MinOpts (-0.01% to +0.00%)
Collection | PDIFF |
---|---|
libraries.pmi.windows.arm64.checked.mch | -0.01% |
FullOpts (-0.00% to +0.27%)
Collection | PDIFF |
---|---|
coreclr_tests.run.windows.arm64.checked.mch | +0.01% |
libraries_tests.run.windows.arm64.Release.mch | +0.27% |
Throughput diffs for windows/x64 ran on windows/x64
Overall (-0.02% to +0.15%)
Collection | PDIFF |
---|---|
aspnet.run.windows.x64.checked.mch | +0.06% |
benchmarks.run_pgo.windows.x64.checked.mch | -0.02% |
coreclr_tests.run.windows.x64.checked.mch | +0.01% |
libraries_tests.run.windows.x64.Release.mch | +0.15% |
FullOpts (-0.02% to +0.21%)
Collection | PDIFF |
---|---|
aspnet.run.windows.x64.checked.mch | +0.07% |
benchmarks.run_pgo.windows.x64.checked.mch | -0.02% |
coreclr_tests.run.windows.x64.checked.mch | +0.01% |
libraries_tests.run.windows.x64.Release.mch | +0.21% |
Details here
Throughput diffs for linux/arm ran on windows/x86
Overall (-0.01% to +0.06%)
Collection | PDIFF |
---|---|
benchmarks.run_pgo.linux.arm.checked.mch | -0.01% |
libraries.pmi.linux.arm.checked.mch | +0.01% |
libraries_tests.run.linux.arm.Release.mch | +0.06% |
FullOpts (-0.01% to +0.08%)
Collection | PDIFF |
---|---|
benchmarks.run_pgo.linux.arm.checked.mch | -0.01% |
libraries.pmi.linux.arm.checked.mch | +0.01% |
libraries_tests.run.linux.arm.Release.mch | +0.08% |
Throughput diffs for windows/x86 ran on windows/x86
Overall (-0.02% to +0.01%)
Collection | PDIFF |
---|---|
benchmarks.run_pgo.windows.x86.checked.mch | -0.02% |
libraries_tests.run.windows.x86.Release.mch | +0.01% |
FullOpts (-0.02% to +0.02%)
Collection | PDIFF |
---|---|
benchmarks.run_pgo.windows.x86.checked.mch | -0.02% |
libraries_tests.run.windows.x86.Release.mch | +0.02% |
Details here
Throughput diffs for linux/arm64 ran on linux/x64
Overall (-0.01% to +0.18%)
Collection | PDIFF |
---|---|
libraries_tests.run.linux.arm64.Release.mch | +0.18% |
benchmarks.run_pgo.linux.arm64.checked.mch | -0.01% |
coreclr_tests.run.linux.arm64.checked.mch | +0.01% |
FullOpts (-0.01% to +0.24%)
Collection | PDIFF |
---|---|
libraries_tests.run.linux.arm64.Release.mch | +0.24% |
benchmarks.run_pgo.linux.arm64.checked.mch | -0.01% |
coreclr_tests.run.linux.arm64.checked.mch | +0.01% |
Throughput diffs for linux/x64 ran on linux/x64
Overall (-0.02% to +0.20%)
Collection | PDIFF |
---|---|
libraries_tests.run.linux.x64.Release.mch | +0.20% |
coreclr_tests.run.linux.x64.checked.mch | +0.01% |
benchmarks.run_pgo.linux.x64.checked.mch | -0.02% |
FullOpts (-0.02% to +0.26%)
Collection | PDIFF |
---|---|
libraries_tests.run.linux.x64.Release.mch | +0.26% |
coreclr_tests.run.linux.x64.checked.mch | +0.01% |
benchmarks.run_pgo.linux.x64.checked.mch | -0.02% |
Details here
Diff results for #98324
Assembly diffs
Assembly diffs for linux/arm ran on windows/x86
Diffs are based on 2,257,223 contexts (832,052 MinOpts, 1,425,171 FullOpts).
MISSED contexts: base: 73,583 (3.16%), diff: 73,599 (3.16%)
Overall (+122,712 bytes)
Collection | Base size (bytes) | Diff size (bytes) |
---|---|---|
benchmarks.run_pgo.linux.arm.checked.mch | 65,676,478 | +36,494 |
coreclr_tests.run.linux.arm.checked.mch | 321,682,372 | +4,784 |
libraries.pmi.linux.arm.checked.mch | 50,272,220 | +36 |
libraries_tests.run.linux.arm.Release.mch | 239,445,652 | +79,630 |
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch | 94,257,664 | +1,768 |
FullOpts (+122,712 bytes)
Collection | Base size (bytes) | Diff size (bytes) |
---|---|---|
benchmarks.run_pgo.linux.arm.checked.mch | 53,624,530 | +36,494 |
coreclr_tests.run.linux.arm.checked.mch | 109,216,788 | +4,784 |
libraries.pmi.linux.arm.checked.mch | 50,165,996 | +36 |
libraries_tests.run.linux.arm.Release.mch | 117,579,462 | +79,630 |
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch | 84,227,820 | +1,768 |
Assembly diffs for windows/x86 ran on windows/x86
Diffs are based on 2,678,702 contexts (1,054,747 MinOpts, 1,623,955 FullOpts).
MISSED contexts: base: 11 (0.00%), diff: 656 (0.02%)
Overall (-51,776 bytes)
Collection | Base size (bytes) | Diff size (bytes) |
---|---|---|
benchmarks.run_pgo.windows.x86.checked.mch | 55,285,524 | -117,707 |
coreclr_tests.run.windows.x86.checked.mch | 371,677,826 | +6,755 |
libraries.pmi.windows.x86.checked.mch | 49,759,154 | +3,071 |
libraries_tests.run.windows.x86.Release.mch | 206,782,528 | +44,446 |
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch | 112,705,032 | +11,659 |
FullOpts (-51,776 bytes)
Collection | Base size (bytes) | Diff size (bytes) |
---|---|---|
benchmarks.run_pgo.windows.x86.checked.mch | 44,439,683 | -117,707 |
coreclr_tests.run.windows.x86.checked.mch | 119,096,783 | +6,755 |
libraries.pmi.windows.x86.checked.mch | 49,663,921 | +3,071 |
libraries_tests.run.windows.x86.Release.mch | 97,462,256 | +44,446 |
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch | 103,862,275 | +11,659 |
Details here
Hmm, rather bigger diffs than I was expecting.
I will need to dig in and see if this is all attributable to more cloning, and whether it is time to at least build some kind of vague heuristic.
Looks like regressions are indeed from more cloning.
In particular type test cloning is driven by the likelihood of the type test succeeding, and with this profile update we now see more tests that appear successful.
@amanasifkhalid can you take another look? I removed the Next block and just wire up the flow directly.
TP diffs good, PerfScore diffs good. Code size increases, but mainly from libraries tests. Code size impact is all from more or fewer clones, all the ones I saw were from the "clone for type test" heuristic which relies on profile data.
Failure is a timeout spmi replay for linux arm32.
Improvements on arm64:
- https://github.com/dotnet/perf-autofiling-issues/issues/30175
Improvements on arm64:
- https://github.com/dotnet/perf-autofiling-issues/issues/30633
- https://github.com/dotnet/perf-autofiling-issues/issues/30646
- https://github.com/dotnet/perf-autofiling-issues/issues/30660