runtime JIT: update block weight for uncond to cond flow opt

This optimization duplicates code and flow in a BBJ_COND successor into one of its preds; as a result the weight of the successor should decrease.

Fixes some issues seen with odd perf scores in the ML/CSE experiment.

Contributes to #93020

Feb 12 '24 20:02 AndyAyersMS

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details

This optimization duplicates code and flow in a BBJ_COND successor into one of its preds; as a result the weight of the successor should decrease.

Fixes some issues seen with odd perf scores in the ML/CSE experiment.

Contributes to #93020

Author:	AndyAyersMS
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

Feb 12 '24 20:02 ghost

FYI @dotnet/jit-contrib

A few large local regressions. The ones I looked at were all additional cloning in loops with type tests.

Feb 12 '24 21:02 AndyAyersMS

Diff results for #98324

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Diffs are based on 2,520,572 contexts (999,218 MinOpts, 1,521,354 FullOpts).

MISSED contexts: base: 4 (0.00%), diff: 97 (0.00%)

Overall (+366,484 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.linux.arm64.checked.mch	74,854,976	-16,084
coreclr_tests.run.linux.arm64.checked.mch	509,287,460	+21,008
libraries.pmi.linux.arm64.checked.mch	76,692,260	+136
libraries_tests.run.linux.arm64.Release.mch	380,944,972	+363,948
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch	164,707,488	-2,524

FullOpts (+366,484 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.linux.arm64.checked.mch	52,852,636	-16,084
coreclr_tests.run.linux.arm64.checked.mch	160,417,140	+21,008
libraries.pmi.linux.arm64.checked.mch	76,572,276	+136
libraries_tests.run.linux.arm64.Release.mch	166,327,572	+363,948
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch	151,304,588	-2,524

Assembly diffs for linux/x64 ran on windows/x64

Diffs are based on 2,542,702 contexts (985,624 MinOpts, 1,557,078 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 95 (0.00%)

Overall (+420,577 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.linux.x64.checked.mch	69,699,439	-55,836
coreclr_tests.run.linux.x64.checked.mch	403,413,227	+19,477
libraries.pmi.linux.x64.checked.mch	60,773,002	+2,467
libraries_tests.run.linux.x64.Release.mch	336,651,500	+446,207
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch	132,477,813	+8,262

FullOpts (+420,577 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.linux.x64.checked.mch	46,789,254	-55,836
coreclr_tests.run.linux.x64.checked.mch	123,798,121	+19,477
libraries.pmi.linux.x64.checked.mch	60,660,145	+2,467
libraries_tests.run.linux.x64.Release.mch	153,474,826	+446,207
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch	121,893,760	+8,262

Assembly diffs for osx/arm64 ran on windows/x64

Diffs are based on 2,262,128 contexts (921,087 MinOpts, 1,341,041 FullOpts).

MISSED contexts: base: 3 (0.00%), diff: 77 (0.00%)

Overall (+202,992 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.osx.arm64.checked.mch	24,713,208	-9,188
coreclr_tests.run.osx.arm64.checked.mch	476,227,036	+16,244
libraries_tests.run.osx.arm64.Release.mch	312,891,008	+199,048
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch	162,507,756	-3,112

FullOpts (+202,992 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.osx.arm64.checked.mch	8,957,196	-9,188
coreclr_tests.run.osx.arm64.checked.mch	150,934,824	+16,244
libraries_tests.run.osx.arm64.Release.mch	111,510,444	+199,048
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch	149,448,344	-3,112

Assembly diffs for windows/arm64 ran on windows/x64

Diffs are based on 2,368,064 contexts (937,277 MinOpts, 1,430,787 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 84 (0.00%)

Overall (+305,988 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.windows.arm64.checked.mch	46,686,644	+1,196
coreclr_tests.run.windows.arm64.checked.mch	496,328,272	+21,196
libraries.pmi.windows.arm64.checked.mch	80,267,692	+828
libraries_tests.run.windows.arm64.Release.mch	323,036,384	+281,288
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch	171,257,132	+1,480

FullOpts (+305,988 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.windows.arm64.checked.mch	30,351,036	+1,196
coreclr_tests.run.windows.arm64.checked.mch	156,850,884	+21,196
libraries.pmi.windows.arm64.checked.mch	80,147,708	+828
libraries_tests.run.windows.arm64.Release.mch	119,164,376	+281,288
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch	158,197,788	+1,480

Assembly diffs for windows/x64 ran on windows/x64

Diffs are based on 2,908,360 contexts (1,240,334 MinOpts, 1,668,026 FullOpts).

MISSED contexts: base: 133 (0.00%), diff: 223 (0.01%)

Overall (+239,961 bytes)

Collection	Base size (bytes)	Diff size (bytes)
aspnet.run.windows.x64.checked.mch	46,759,029	+26,975
benchmarks.run_pgo.windows.x64.checked.mch	45,780,921	-82,475
coreclr_tests.run.windows.x64.checked.mch	464,618,729	+19,542
libraries.pmi.windows.x64.checked.mch	64,172,055	+3,231
libraries_tests.run.windows.x64.Release.mch	309,629,720	+259,697
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	149,849,548	+12,991

FullOpts (+239,961 bytes)

Collection	Base size (bytes)	Diff size (bytes)
aspnet.run.windows.x64.checked.mch	28,268,214	+26,975
benchmarks.run_pgo.windows.x64.checked.mch	23,454,275	-82,475
coreclr_tests.run.windows.x64.checked.mch	130,697,415	+19,542
libraries.pmi.windows.x64.checked.mch	64,058,534	+3,231
libraries_tests.run.windows.x64.Release.mch	110,763,662	+259,697
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	138,621,801	+12,991

Details here

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (-0.01% to +0.18%)

Collection	PDIFF
benchmarks.run_pgo.linux.arm64.checked.mch	-0.01%
coreclr_tests.run.linux.arm64.checked.mch	+0.01%
libraries_tests.run.linux.arm64.Release.mch	+0.18%

MinOpts (-0.00% to +0.01%)

Collection	PDIFF
libraries.pmi.linux.arm64.checked.mch	+0.01%

FullOpts (-0.01% to +0.24%)

Collection	PDIFF
benchmarks.run_pgo.linux.arm64.checked.mch	-0.01%
coreclr_tests.run.linux.arm64.checked.mch	+0.02%
libraries_tests.run.linux.arm64.Release.mch	+0.24%

Throughput diffs for linux/x64 ran on windows/x64

Overall (-0.02% to +0.21%)

Collection	PDIFF
benchmarks.run_pgo.linux.x64.checked.mch	-0.02%
coreclr_tests.run.linux.x64.checked.mch	+0.01%
libraries_tests.run.linux.x64.Release.mch	+0.21%

FullOpts (-0.02% to +0.26%)

Collection	PDIFF
benchmarks.run_pgo.linux.x64.checked.mch	-0.02%
coreclr_tests.run.linux.x64.checked.mch	+0.01%
libraries_tests.run.linux.x64.Release.mch	+0.26%

Throughput diffs for osx/arm64 ran on windows/x64

Overall (-0.01% to +0.14%)

Collection	PDIFF
benchmarks.run_pgo.osx.arm64.checked.mch	-0.01%
coreclr_tests.run.osx.arm64.checked.mch	+0.01%
libraries_tests.run.osx.arm64.Release.mch	+0.14%

FullOpts (-0.01% to +0.21%)

Collection	PDIFF
benchmarks.run_pgo.osx.arm64.checked.mch	-0.01%
coreclr_tests.run.osx.arm64.checked.mch	+0.01%
libraries_tests.run.osx.arm64.Release.mch	+0.21%

Throughput diffs for windows/arm64 ran on windows/x64

Overall (-0.00% to +0.19%)

Collection	PDIFF
coreclr_tests.run.windows.arm64.checked.mch	+0.01%
libraries_tests.run.windows.arm64.Release.mch	+0.19%

MinOpts (-0.01% to +0.00%)

Collection	PDIFF
libraries.pmi.windows.arm64.checked.mch	-0.01%

FullOpts (-0.00% to +0.27%)

Collection	PDIFF
coreclr_tests.run.windows.arm64.checked.mch	+0.01%
libraries_tests.run.windows.arm64.Release.mch	+0.27%

Throughput diffs for windows/x64 ran on windows/x64

Overall (-0.02% to +0.15%)

Collection	PDIFF
aspnet.run.windows.x64.checked.mch	+0.06%
benchmarks.run_pgo.windows.x64.checked.mch	-0.02%
coreclr_tests.run.windows.x64.checked.mch	+0.01%
libraries_tests.run.windows.x64.Release.mch	+0.15%

FullOpts (-0.02% to +0.21%)

Collection	PDIFF
aspnet.run.windows.x64.checked.mch	+0.07%
benchmarks.run_pgo.windows.x64.checked.mch	-0.02%
coreclr_tests.run.windows.x64.checked.mch	+0.01%
libraries_tests.run.windows.x64.Release.mch	+0.21%

Details here

Throughput diffs for linux/arm ran on windows/x86

Overall (-0.01% to +0.06%)

Collection	PDIFF
benchmarks.run_pgo.linux.arm.checked.mch	-0.01%
libraries.pmi.linux.arm.checked.mch	+0.01%
libraries_tests.run.linux.arm.Release.mch	+0.06%

FullOpts (-0.01% to +0.08%)

Collection	PDIFF
benchmarks.run_pgo.linux.arm.checked.mch	-0.01%
libraries.pmi.linux.arm.checked.mch	+0.01%
libraries_tests.run.linux.arm.Release.mch	+0.08%

Throughput diffs for windows/x86 ran on windows/x86

Overall (-0.02% to +0.01%)

Collection	PDIFF
benchmarks.run_pgo.windows.x86.checked.mch	-0.02%
libraries_tests.run.windows.x86.Release.mch	+0.01%

FullOpts (-0.02% to +0.02%)

Collection	PDIFF
benchmarks.run_pgo.windows.x86.checked.mch	-0.02%
libraries_tests.run.windows.x86.Release.mch	+0.02%

Details here

Throughput diffs for linux/arm64 ran on linux/x64

Overall (-0.01% to +0.18%)

Collection	PDIFF
libraries_tests.run.linux.arm64.Release.mch	+0.18%
benchmarks.run_pgo.linux.arm64.checked.mch	-0.01%
coreclr_tests.run.linux.arm64.checked.mch	+0.01%

FullOpts (-0.01% to +0.24%)

Collection	PDIFF
libraries_tests.run.linux.arm64.Release.mch	+0.24%
benchmarks.run_pgo.linux.arm64.checked.mch	-0.01%
coreclr_tests.run.linux.arm64.checked.mch	+0.01%

Throughput diffs for linux/x64 ran on linux/x64

Overall (-0.02% to +0.20%)

Collection	PDIFF
libraries_tests.run.linux.x64.Release.mch	+0.20%
coreclr_tests.run.linux.x64.checked.mch	+0.01%
benchmarks.run_pgo.linux.x64.checked.mch	-0.02%

FullOpts (-0.02% to +0.26%)

Collection	PDIFF
libraries_tests.run.linux.x64.Release.mch	+0.26%
coreclr_tests.run.linux.x64.checked.mch	+0.01%
benchmarks.run_pgo.linux.x64.checked.mch	-0.02%

Details here

Feb 13 '24 00:02 ryujit-bot

Diff results for #98324

Assembly diffs

Assembly diffs for linux/arm ran on windows/x86

Diffs are based on 2,257,223 contexts (832,052 MinOpts, 1,425,171 FullOpts).

MISSED contexts: base: 73,583 (3.16%), diff: 73,599 (3.16%)

Overall (+122,712 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.linux.arm.checked.mch	65,676,478	+36,494
coreclr_tests.run.linux.arm.checked.mch	321,682,372	+4,784
libraries.pmi.linux.arm.checked.mch	50,272,220	+36
libraries_tests.run.linux.arm.Release.mch	239,445,652	+79,630
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch	94,257,664	+1,768

FullOpts (+122,712 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.linux.arm.checked.mch	53,624,530	+36,494
coreclr_tests.run.linux.arm.checked.mch	109,216,788	+4,784
libraries.pmi.linux.arm.checked.mch	50,165,996	+36
libraries_tests.run.linux.arm.Release.mch	117,579,462	+79,630
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch	84,227,820	+1,768

Assembly diffs for windows/x86 ran on windows/x86

Diffs are based on 2,678,702 contexts (1,054,747 MinOpts, 1,623,955 FullOpts).

MISSED contexts: base: 11 (0.00%), diff: 656 (0.02%)

Overall (-51,776 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.windows.x86.checked.mch	55,285,524	-117,707
coreclr_tests.run.windows.x86.checked.mch	371,677,826	+6,755
libraries.pmi.windows.x86.checked.mch	49,759,154	+3,071
libraries_tests.run.windows.x86.Release.mch	206,782,528	+44,446
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch	112,705,032	+11,659

FullOpts (-51,776 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.windows.x86.checked.mch	44,439,683	-117,707
coreclr_tests.run.windows.x86.checked.mch	119,096,783	+6,755
libraries.pmi.windows.x86.checked.mch	49,663,921	+3,071
libraries_tests.run.windows.x86.Release.mch	97,462,256	+44,446
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch	103,862,275	+11,659

Details here

Feb 13 '24 01:02 ryujit-bot

Hmm, rather bigger diffs than I was expecting.

I will need to dig in and see if this is all attributable to more cloning, and whether it is time to at least build some kind of vague heuristic.

Feb 13 '24 01:02 AndyAyersMS

Looks like regressions are indeed from more cloning.

Feb 26 '24 23:02 AndyAyersMS

In particular type test cloning is driven by the likelihood of the type test succeeding, and with this profile update we now see more tests that appear successful.

Feb 26 '24 23:02 AndyAyersMS

@amanasifkhalid can you take another look? I removed the Next block and just wire up the flow directly.

TP diffs good, PerfScore diffs good. Code size increases, but mainly from libraries tests. Code size impact is all from more or fewer clones, all the ones I saw were from the "clone for type test" heuristic which relies on profile data.

Feb 27 '24 15:02 AndyAyersMS

Failure is a timeout spmi replay for linux arm32.

Feb 27 '24 15:02 AndyAyersMS

Improvements on arm64:

https://github.com/dotnet/perf-autofiling-issues/issues/30175

Feb 29 '24 17:02 EgorBo

Improvements on arm64:

https://github.com/dotnet/perf-autofiling-issues/issues/30633
https://github.com/dotnet/perf-autofiling-issues/issues/30646
https://github.com/dotnet/perf-autofiling-issues/issues/30660

Mar 07 '24 17:03 EgorBo

runtime runtime copied to clipboard

JIT: update block weight for uncond to cond flow opt

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Assembly diffs for linux/x64 ran on windows/x64

Assembly diffs for osx/arm64 ran on windows/x64

Assembly diffs for windows/arm64 ran on windows/x64

Assembly diffs for windows/x64 ran on windows/x64

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Throughput diffs for linux/x64 ran on windows/x64

Throughput diffs for osx/arm64 ran on windows/x64

Throughput diffs for windows/arm64 ran on windows/x64

Throughput diffs for windows/x64 ran on windows/x64

Throughput diffs for linux/arm ran on windows/x86

Throughput diffs for windows/x86 ran on windows/x86

Throughput diffs for linux/arm64 ran on linux/x64

Throughput diffs for linux/x64 ran on linux/x64

Assembly diffs

Assembly diffs for linux/arm ran on windows/x86

Assembly diffs for windows/x86 ran on windows/x86

runtime
runtime copied to clipboard