llvm-project
llvm-project copied to clipboard
IPSCCP pass can't reach a fixed-point when conditions have dependency
https://alive2.llvm.org/ce/z/FCT5eJ
define void @src(i1 %a, i1 %b) {
entry:
br i1 %a, label %a.true_b.false, label %a.false
a.false: ; preds = %entry
br i1 %b, label %b.true, label %if.end
b.true: ; preds = %a.false
br i1 %b, label %if.end, label %a.true_b.false
a.true_b.false: ; preds = %b.true, %entry
br i1 %a, label %if.end, label %if.then
if.then: ; preds = %a.true_b.false
call void @use()
br label %if.end
if.end: ; preds = %if.then, %a.true_b.false, %b.true, %a.false
ret void
}
declare void @use()
First time SCCPSolver's log:
markOverdefined: i1 %a
markOverdefined: i1 %b
Popped off OI-WL: i1 %b
Popped off OI-WL: i1 %a
Marking Block Executable: a.true_b.false
Marking Block Executable: a.false
Popped off BBWL:
a.false: ; preds = %entry
%b.0 = call i1 @llvm.ssa.copy.i1(i1 %b)
br i1 %b, label %b.true, label %if.end
Merged constantrange<-1, 0> into %b.0 = call i1 @llvm.ssa.copy.i1(i1 %b) : constantrange<-1, 0>
Marking Block Executable: b.true
Marking Block Executable: if.end
Popped off BBWL:
if.end: ; preds = %if.then, %a.true_b.false, %b.true, %a.false
ret void
Popped off BBWL:
b.true: ; preds = %a.false
br i1 %b.0, label %if.end, label %a.true_b.false
Marking Edge Executable: b.true -> if.end
Popped off BBWL:
a.true_b.false: ; preds = %b.true, %entry
br i1 %a, label %if.end, label %if.then
Marking Edge Executable: a.true_b.false -> if.end
Marking Block Executable: if.then
Popped off BBWL:
if.then: ; preds = %a.true_b.false
call void @use()
br label %if.end
Marking Edge Executable: if.then -> if.end
Popped off BBWL:
entry:
br i1 %a, label %a.true_b.false, label %a.false
Popped off I-WL: %b.0 = call i1 @llvm.ssa.copy.i1(i1 %b)
We need to run IPSCCP twice to eliminate the dead branch with function @use
.
The issue here is that IPSCCP is using PredicateInfo to propagate conditional information. PredicateInfo only tracks information in blocks dominated by the true & false blocks of the branch. At construction time, neither a.false
nor a.true_b.false
dominate a.true_b.false
, so we do not catch it.
The issue here is that IPSCCP is using PredicateInfo to propagate conditional information. PredicateInfo only tracks information in blocks dominated by the true & false blocks of the branch. At construction time, neither
a.false
nora.true_b.false
dominatea.true_b.false
, so we do not catch it.
Can we add a function to mark edge not executable when we propagate %b.0 = call i1 @llvm.ssa.copy.i1(i1 %b) to fix the issue? Or loop runIPSCCP until not changed?
I think the edge .true->a.true_b.false
is already not marked as executable. This is the point where we could in theory connect the condition to the earlier one, but I am not sure how to do this in a scalable way.
Not sure if running IPSCCP is desirable, it is already quite expensive.
It looks PredicateInfo doesn't support update dynamic also. No easy way to make it work.
Trying to add some test code here https://github.com/bcl5980/llvm-project/tree/perf/ipsccp-loop A lot of compile time increase but got a little performance gain on SPEC2017, so maybe we need to find some other way to improve.
ipsccp-loop | base | ||||
---|---|---|---|---|---|
500.perlbench_r | 355.2289 | 4.4816169 | 365.9355 | 4.350494 | |
502.gcc_r | 247.9069 | 5.7118226 | 251.1451 | 5.638175 | |
505.mcf_r | 307.4327 | 5.2564351 | 314.2372 | 5.142612 | |
520.omnetpp_r | 489.193 | 2.6819681 | 517.4859 | 2.535335 | |
523.xalancbmk_r | 215.4812 | 4.9006594 | 220.6564 | 4.785721 | |
525.x264_r | 203.8385 | 8.5901359 | 204.6118 | 8.557669 | |
531.deepsjeng_r | 280.6433 | 4.083475 | 285.7085 | 4.011081 | |
541.leela_r | 438.2885 | 3.7783331 | 432.7863 | 3.826369 | |
548.exchange2_r | |||||
557.xz_r | 314.2906 | 3.4363098 | 322.7374 | 3.346373 | |
SPECrate2017_int_base | 4.5313302 | 4.441562 |
Try loop runipsccp to a fix point: The loop count statistcs is here:
External/S...6.blender_r/526.blender_r.test 518.00
External/S...lancbmk_r/523.xalancbmk_r.test 230.00
External/S...lancbmk_s/623.xalancbmk_s.test 230.00
External/S...510.parest_r/510.parest_r.test 176.00
MultiSourc...marks/7zip/7zip-benchmark.test 157.00
External/S...0.omnetpp_s/620.omnetpp_s.test 153.00
External/S...0.omnetpp_r/520.omnetpp_r.test 153.00
External/S...8.imagick_s/638.imagick_s.test 74.00
External/S...8.imagick_r/538.imagick_r.test 74.00
MultiSourc...lications/ClamAV/clamscan.test 71.00
External/S...rlbench_s/600.perlbench_s.test 58.00
External/S...rlbench_r/500.perlbench_r.test 58.00
External/S...511.povray_r/511.povray_r.test 58.00
MultiSourc.../Applications/SPASS/SPASS.test 46.00
MultiSourc...ications/JM/lencod/lencod.test 45.00
MultiSourc.../Benchmarks/Bullet/bullet.test 41.00
MultiSourc...-typeset/consumer-typeset.test 40.00
MultiSourc...ocBench/espresso/espresso.test 36.00
MultiSourc...chmarks/MallocBench/gs/gs.test 35.00
MultiSourc...abench/jpeg/jpeg-6a/cjpeg.test 32.00
External/S...17speed/657.xz_s/657.xz_s.test 31.00
MultiSourc...nsumer-jpeg/consumer-jpeg.test 31.00
External/S...017rate/557.xz_r/557.xz_r.test 31.00
External/S...eed/625.x264_s/625.x264_s.test 28.00
External/S...ate/525.x264_r/525.x264_r.test 28.00
MultiSourc...CI_Purple/SMG2000/smg2000.test 27.00
MultiSourc...ProxyApps-C++/CLAMR/CLAMR.test 26.00
MultiSourc...ications/JM/ldecod/ldecod.test 21.00
MultiSourc...nsumer-lame/consumer-lame.test 20.00
MultiSourc...arks/mafft/pairlocalalign.test 19.00
MultiSource/Applications/lua/lua.test 18.00
MultiSourc...s/MallocBench/cfrac/cfrac.test 18.00
External/S...d/641.leela_s/641.leela_s.test 16.00
External/S...e/541.leela_r/541.leela_r.test 16.00
External/S...epsjeng_s/631.deepsjeng_s.test 12.00
MultiSourc...Applications/kimwitu++/kc.test 12.00
External/S...epsjeng_r/531.deepsjeng_r.test 12.00
External/S...speed/644.nab_s/644.nab_s.test 12.00
External/S...7rate/544.nab_r/544.nab_r.test 12.00
MultiSourc.../Prolangs-C/bison/mybison.test 11.00
MultiSourc...yApps-C++/PENNANT/PENNANT.test 11.00
MultiSourc...plications/d/make_dparser.test 10.00
MultiSourc...telecomm-gsm/telecomm-gsm.test 9.00
MultiSourc...ks/Prolangs-C/agrep/agrep.test 9.00
MultiSourc...lications/obsequi/Obsequi.test 9.00
External/S...ate/508.namd_r/508.namd_r.test 9.00
MultiSourc...ediabench/gsm/toast/toast.test 9.00
MultiSourc.../Applications/spiff/spiff.test 8.00
MultiSource/Applications/hbd/hbd.test 7.00
MultiSourc...ProxyApps-C++/HPCCG/HPCCG.test 7.00
MultiSourc...ks/Prolangs-C++/city/city.test 7.00
MultiSourc...peg2/mpeg2dec/mpeg2decode.test 7.00
MultiSourc...encode/alacconvert-encode.test 6.00
MultiSourc...s-C/Pathfinder/PathFinder.test 6.00
MultiSourc...oxyApps-C/miniAMR/miniAMR.test 6.00
MultiSourc...ch/g721/g721encode/encode.test 6.00
MicroBench...ALambdaLoops/lcalsALambda.test 6.00
MultiSourc...oxyApps-C++/miniFE/miniFE.test 6.00
MultiSourc...decode/alacconvert-decode.test 6.00
MicroBench...CLambdaLoops/lcalsCLambda.test 6.00
MultiSourc...arks/VersaBench/dbms/dbms.test 6.00
MicroBench...SubsetCRawLoops/lcalsCRaw.test 6.00
MultiSourc...cations/hexxagon/hexxagon.test 5.00
External/S...speed/605.mcf_s/605.mcf_s.test 5.00
MicroBench...SubsetARawLoops/lcalsARaw.test 5.00
MicroBench...SubsetBRawLoops/lcalsBRaw.test 5.00
External/S...7rate/505.mcf_r/505.mcf_r.test 5.00
MicroBench...BLambdaLoops/lcalsBLambda.test 5.00
MicroBench...eProcessing/Dilate/Dilate.test 4.00
MultiSourc...pps-C/SimpleMOC/SimpleMOC.test 4.00
MultiSourc...oxyApps-C/XSBench/XSBench.test 4.00
MicroBench...eProcessing/Dither/Dither.test 4.00
MultiSourc...marks/Ptrdist/yacr2/yacr2.test 4.00
MultiSourc...DOE-ProxyApps-C/CoMD/CoMD.test 4.00
MultiSourc.../Benchmarks/Ptrdist/bc/bc.test 4.00
MultiSourc...tions/lambda-0.1.3/lambda.test 4.00
MultiSourc...ks/Prolangs-C/gnugo/gnugo.test 4.00
MultiSourc...oxyApps-C/miniGMG/miniGMG.test 3.00
MultiSourc...count/automotive-bitcount.test 3.00
MultiSourc...nch/pcompress2/pcompress2.test 3.00
MultiSourc...oxyApps-C/RSBench/rsbench.test 3.00
MultiSourc.../Benchmarks/nbench/nbench.test 3.00
MultiSourc...s/ASC_Sequoia/AMGmk/AMGmk.test 3.00
MicroBench...sion/AnisotropicDiffusion.test 3.00
MicroBench...Filtering/BilateralFilter.test 3.00
MicroBench...ImageProcessing/Blur/blur.test 3.00
MultiSourc...lications/viterbi/viterbi.test 3.00
MicroBench...terpolation/Interpolation.test 3.00
MultiSource/Applications/siod/siod.test 3.00
MultiSourc.../Applications/sgefa/sgefa.test 3.00
MultiSourc...lications/SIBsim4/SIBsim4.test 3.00
MultiSourc...adpcm/rawcaudio/rawcaudio.test 2.00
MultiSourc.../Benchmarks/Olden/mst/mst.test 2.00
MultiSourc...Rodinia/backprop/backprop.test 2.00
MultiSourc...nchmarks/McCat/05-eks/eks.test 2.00
MultiSourc...cCat/03-testtrie/testtrie.test 2.00
MultiSourc...marks/SciMark2-C/scimark2.test 2.00
MultiSourc...nia/pathfinder/pathfinder.test 2.00
MultiSourc...s/Rodinia/hotspot/hotspot.test 2.00
MultiSourc...patricia/network-patricia.test 2.00
MultiSourc...rks/FreeBench/pifft/pifft.test 2.00
MultiSourc...lications/minisat/minisat.test 2.00
MultiSourc...lications/sqlite3/sqlite3.test 2.00
MicroBench...opVectorizationBenchmarks.test 2.00
MultiSourc...s/Fhourstones/fhourstones.test 2.00
sccp.NumIPSccpRun
run ipsccp_loop
count 225.000000
mean 13.795556
std 46.372839
min 1.000000
25% 1.000000
50% 1.000000
75% 6.000000
max 518.000000
It looks we have a lot of cases benefit from ipsccp-loop
Try loop runipsccp to a fix point: The loop count statistcs is here:
Is that the number of times we re-run IPSCCP? It may be interesting to see the difference in number of simplifications applied? Should be doable with the existing ipsccp stats.
Try loop runipsccp to a fix point: The loop count statistcs is here:
Is that the number of times we re-run IPSCCP? It may be interesting to see the difference in number of simplifications applied? Should be doable with the existing ipsccp stats.
Yes, it is the number of times we re-run IPSCCP. Is this stat you want? IPNumInstRemoved.txt