runtime icon indicating copy to clipboard operation
runtime copied to clipboard

JIT: update block weight for uncond to cond flow opt

Open AndyAyersMS opened this issue 1 year ago • 2 comments

This optimization duplicates code and flow in a BBJ_COND successor into one of its preds; as a result the weight of the successor should decrease.

Fixes some issues seen with odd perf scores in the ML/CSE experiment.

Contributes to #93020

AndyAyersMS avatar Feb 12 '24 20:02 AndyAyersMS

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details

This optimization duplicates code and flow in a BBJ_COND successor into one of its preds; as a result the weight of the successor should decrease.

Fixes some issues seen with odd perf scores in the ML/CSE experiment.

Contributes to #93020

Author: AndyAyersMS
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

ghost avatar Feb 12 '24 20:02 ghost

FYI @dotnet/jit-contrib

A few large local regressions. The ones I looked at were all additional cloning in loops with type tests.

AndyAyersMS avatar Feb 12 '24 21:02 AndyAyersMS

Diff results for #98324

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Diffs are based on 2,520,572 contexts (999,218 MinOpts, 1,521,354 FullOpts).

MISSED contexts: base: 4 (0.00%), diff: 97 (0.00%)

Overall (+366,484 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.arm64.checked.mch 74,854,976 -16,084
coreclr_tests.run.linux.arm64.checked.mch 509,287,460 +21,008
libraries.pmi.linux.arm64.checked.mch 76,692,260 +136
libraries_tests.run.linux.arm64.Release.mch 380,944,972 +363,948
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 164,707,488 -2,524
FullOpts (+366,484 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.arm64.checked.mch 52,852,636 -16,084
coreclr_tests.run.linux.arm64.checked.mch 160,417,140 +21,008
libraries.pmi.linux.arm64.checked.mch 76,572,276 +136
libraries_tests.run.linux.arm64.Release.mch 166,327,572 +363,948
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 151,304,588 -2,524

Assembly diffs for linux/x64 ran on windows/x64

Diffs are based on 2,542,702 contexts (985,624 MinOpts, 1,557,078 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 95 (0.00%)

Overall (+420,577 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.x64.checked.mch 69,699,439 -55,836
coreclr_tests.run.linux.x64.checked.mch 403,413,227 +19,477
libraries.pmi.linux.x64.checked.mch 60,773,002 +2,467
libraries_tests.run.linux.x64.Release.mch 336,651,500 +446,207
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 132,477,813 +8,262
FullOpts (+420,577 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.x64.checked.mch 46,789,254 -55,836
coreclr_tests.run.linux.x64.checked.mch 123,798,121 +19,477
libraries.pmi.linux.x64.checked.mch 60,660,145 +2,467
libraries_tests.run.linux.x64.Release.mch 153,474,826 +446,207
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 121,893,760 +8,262

Assembly diffs for osx/arm64 ran on windows/x64

Diffs are based on 2,262,128 contexts (921,087 MinOpts, 1,341,041 FullOpts).

MISSED contexts: base: 3 (0.00%), diff: 77 (0.00%)

Overall (+202,992 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.osx.arm64.checked.mch 24,713,208 -9,188
coreclr_tests.run.osx.arm64.checked.mch 476,227,036 +16,244
libraries_tests.run.osx.arm64.Release.mch 312,891,008 +199,048
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 162,507,756 -3,112
FullOpts (+202,992 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.osx.arm64.checked.mch 8,957,196 -9,188
coreclr_tests.run.osx.arm64.checked.mch 150,934,824 +16,244
libraries_tests.run.osx.arm64.Release.mch 111,510,444 +199,048
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 149,448,344 -3,112

Assembly diffs for windows/arm64 ran on windows/x64

Diffs are based on 2,368,064 contexts (937,277 MinOpts, 1,430,787 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 84 (0.00%)

Overall (+305,988 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.windows.arm64.checked.mch 46,686,644 +1,196
coreclr_tests.run.windows.arm64.checked.mch 496,328,272 +21,196
libraries.pmi.windows.arm64.checked.mch 80,267,692 +828
libraries_tests.run.windows.arm64.Release.mch 323,036,384 +281,288
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 171,257,132 +1,480
FullOpts (+305,988 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.windows.arm64.checked.mch 30,351,036 +1,196
coreclr_tests.run.windows.arm64.checked.mch 156,850,884 +21,196
libraries.pmi.windows.arm64.checked.mch 80,147,708 +828
libraries_tests.run.windows.arm64.Release.mch 119,164,376 +281,288
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 158,197,788 +1,480

Assembly diffs for windows/x64 ran on windows/x64

Diffs are based on 2,908,360 contexts (1,240,334 MinOpts, 1,668,026 FullOpts).

MISSED contexts: base: 133 (0.00%), diff: 223 (0.01%)

Overall (+239,961 bytes)
Collection Base size (bytes) Diff size (bytes)
aspnet.run.windows.x64.checked.mch 46,759,029 +26,975
benchmarks.run_pgo.windows.x64.checked.mch 45,780,921 -82,475
coreclr_tests.run.windows.x64.checked.mch 464,618,729 +19,542
libraries.pmi.windows.x64.checked.mch 64,172,055 +3,231
libraries_tests.run.windows.x64.Release.mch 309,629,720 +259,697
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 149,849,548 +12,991
FullOpts (+239,961 bytes)
Collection Base size (bytes) Diff size (bytes)
aspnet.run.windows.x64.checked.mch 28,268,214 +26,975
benchmarks.run_pgo.windows.x64.checked.mch 23,454,275 -82,475
coreclr_tests.run.windows.x64.checked.mch 130,697,415 +19,542
libraries.pmi.windows.x64.checked.mch 64,058,534 +3,231
libraries_tests.run.windows.x64.Release.mch 110,763,662 +259,697
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 138,621,801 +12,991

Details here


Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (-0.01% to +0.18%)
Collection PDIFF
benchmarks.run_pgo.linux.arm64.checked.mch -0.01%
coreclr_tests.run.linux.arm64.checked.mch +0.01%
libraries_tests.run.linux.arm64.Release.mch +0.18%
MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch +0.01%
FullOpts (-0.01% to +0.24%)
Collection PDIFF
benchmarks.run_pgo.linux.arm64.checked.mch -0.01%
coreclr_tests.run.linux.arm64.checked.mch +0.02%
libraries_tests.run.linux.arm64.Release.mch +0.24%

Throughput diffs for linux/x64 ran on windows/x64

Overall (-0.02% to +0.21%)
Collection PDIFF
benchmarks.run_pgo.linux.x64.checked.mch -0.02%
coreclr_tests.run.linux.x64.checked.mch +0.01%
libraries_tests.run.linux.x64.Release.mch +0.21%
FullOpts (-0.02% to +0.26%)
Collection PDIFF
benchmarks.run_pgo.linux.x64.checked.mch -0.02%
coreclr_tests.run.linux.x64.checked.mch +0.01%
libraries_tests.run.linux.x64.Release.mch +0.26%

Throughput diffs for osx/arm64 ran on windows/x64

Overall (-0.01% to +0.14%)
Collection PDIFF
benchmarks.run_pgo.osx.arm64.checked.mch -0.01%
coreclr_tests.run.osx.arm64.checked.mch +0.01%
libraries_tests.run.osx.arm64.Release.mch +0.14%
FullOpts (-0.01% to +0.21%)
Collection PDIFF
benchmarks.run_pgo.osx.arm64.checked.mch -0.01%
coreclr_tests.run.osx.arm64.checked.mch +0.01%
libraries_tests.run.osx.arm64.Release.mch +0.21%

Throughput diffs for windows/arm64 ran on windows/x64

Overall (-0.00% to +0.19%)
Collection PDIFF
coreclr_tests.run.windows.arm64.checked.mch +0.01%
libraries_tests.run.windows.arm64.Release.mch +0.19%
MinOpts (-0.01% to +0.00%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch -0.01%
FullOpts (-0.00% to +0.27%)
Collection PDIFF
coreclr_tests.run.windows.arm64.checked.mch +0.01%
libraries_tests.run.windows.arm64.Release.mch +0.27%

Throughput diffs for windows/x64 ran on windows/x64

Overall (-0.02% to +0.15%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +0.06%
benchmarks.run_pgo.windows.x64.checked.mch -0.02%
coreclr_tests.run.windows.x64.checked.mch +0.01%
libraries_tests.run.windows.x64.Release.mch +0.15%
FullOpts (-0.02% to +0.21%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +0.07%
benchmarks.run_pgo.windows.x64.checked.mch -0.02%
coreclr_tests.run.windows.x64.checked.mch +0.01%
libraries_tests.run.windows.x64.Release.mch +0.21%

Details here


Throughput diffs for linux/arm ran on windows/x86

Overall (-0.01% to +0.06%)
Collection PDIFF
benchmarks.run_pgo.linux.arm.checked.mch -0.01%
libraries.pmi.linux.arm.checked.mch +0.01%
libraries_tests.run.linux.arm.Release.mch +0.06%
FullOpts (-0.01% to +0.08%)
Collection PDIFF
benchmarks.run_pgo.linux.arm.checked.mch -0.01%
libraries.pmi.linux.arm.checked.mch +0.01%
libraries_tests.run.linux.arm.Release.mch +0.08%

Throughput diffs for windows/x86 ran on windows/x86

Overall (-0.02% to +0.01%)
Collection PDIFF
benchmarks.run_pgo.windows.x86.checked.mch -0.02%
libraries_tests.run.windows.x86.Release.mch +0.01%
FullOpts (-0.02% to +0.02%)
Collection PDIFF
benchmarks.run_pgo.windows.x86.checked.mch -0.02%
libraries_tests.run.windows.x86.Release.mch +0.02%

Details here


Throughput diffs for linux/arm64 ran on linux/x64

Overall (-0.01% to +0.18%)
Collection PDIFF
libraries_tests.run.linux.arm64.Release.mch +0.18%
benchmarks.run_pgo.linux.arm64.checked.mch -0.01%
coreclr_tests.run.linux.arm64.checked.mch +0.01%
FullOpts (-0.01% to +0.24%)
Collection PDIFF
libraries_tests.run.linux.arm64.Release.mch +0.24%
benchmarks.run_pgo.linux.arm64.checked.mch -0.01%
coreclr_tests.run.linux.arm64.checked.mch +0.01%

Throughput diffs for linux/x64 ran on linux/x64

Overall (-0.02% to +0.20%)
Collection PDIFF
libraries_tests.run.linux.x64.Release.mch +0.20%
coreclr_tests.run.linux.x64.checked.mch +0.01%
benchmarks.run_pgo.linux.x64.checked.mch -0.02%
FullOpts (-0.02% to +0.26%)
Collection PDIFF
libraries_tests.run.linux.x64.Release.mch +0.26%
coreclr_tests.run.linux.x64.checked.mch +0.01%
benchmarks.run_pgo.linux.x64.checked.mch -0.02%

Details here


ryujit-bot avatar Feb 13 '24 00:02 ryujit-bot

Diff results for #98324

Assembly diffs

Assembly diffs for linux/arm ran on windows/x86

Diffs are based on 2,257,223 contexts (832,052 MinOpts, 1,425,171 FullOpts).

MISSED contexts: base: 73,583 (3.16%), diff: 73,599 (3.16%)

Overall (+122,712 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.arm.checked.mch 65,676,478 +36,494
coreclr_tests.run.linux.arm.checked.mch 321,682,372 +4,784
libraries.pmi.linux.arm.checked.mch 50,272,220 +36
libraries_tests.run.linux.arm.Release.mch 239,445,652 +79,630
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 94,257,664 +1,768
FullOpts (+122,712 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.arm.checked.mch 53,624,530 +36,494
coreclr_tests.run.linux.arm.checked.mch 109,216,788 +4,784
libraries.pmi.linux.arm.checked.mch 50,165,996 +36
libraries_tests.run.linux.arm.Release.mch 117,579,462 +79,630
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 84,227,820 +1,768

Assembly diffs for windows/x86 ran on windows/x86

Diffs are based on 2,678,702 contexts (1,054,747 MinOpts, 1,623,955 FullOpts).

MISSED contexts: base: 11 (0.00%), diff: 656 (0.02%)

Overall (-51,776 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.windows.x86.checked.mch 55,285,524 -117,707
coreclr_tests.run.windows.x86.checked.mch 371,677,826 +6,755
libraries.pmi.windows.x86.checked.mch 49,759,154 +3,071
libraries_tests.run.windows.x86.Release.mch 206,782,528 +44,446
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 112,705,032 +11,659
FullOpts (-51,776 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.windows.x86.checked.mch 44,439,683 -117,707
coreclr_tests.run.windows.x86.checked.mch 119,096,783 +6,755
libraries.pmi.windows.x86.checked.mch 49,663,921 +3,071
libraries_tests.run.windows.x86.Release.mch 97,462,256 +44,446
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 103,862,275 +11,659

Details here


ryujit-bot avatar Feb 13 '24 01:02 ryujit-bot

Hmm, rather bigger diffs than I was expecting.

I will need to dig in and see if this is all attributable to more cloning, and whether it is time to at least build some kind of vague heuristic.

AndyAyersMS avatar Feb 13 '24 01:02 AndyAyersMS

Looks like regressions are indeed from more cloning.

AndyAyersMS avatar Feb 26 '24 23:02 AndyAyersMS

In particular type test cloning is driven by the likelihood of the type test succeeding, and with this profile update we now see more tests that appear successful.

AndyAyersMS avatar Feb 26 '24 23:02 AndyAyersMS

@amanasifkhalid can you take another look? I removed the Next block and just wire up the flow directly.

TP diffs good, PerfScore diffs good. Code size increases, but mainly from libraries tests. Code size impact is all from more or fewer clones, all the ones I saw were from the "clone for type test" heuristic which relies on profile data.

AndyAyersMS avatar Feb 27 '24 15:02 AndyAyersMS

Failure is a timeout spmi replay for linux arm32.

AndyAyersMS avatar Feb 27 '24 15:02 AndyAyersMS

Improvements on arm64:

  • https://github.com/dotnet/perf-autofiling-issues/issues/30175

EgorBo avatar Feb 29 '24 17:02 EgorBo

Improvements on arm64:

  • https://github.com/dotnet/perf-autofiling-issues/issues/30633
  • https://github.com/dotnet/perf-autofiling-issues/issues/30646
  • https://github.com/dotnet/perf-autofiling-issues/issues/30660

EgorBo avatar Mar 07 '24 17:03 EgorBo