llvm-project icon indicating copy to clipboard operation
llvm-project copied to clipboard

IPSCCP pass can't reach a fixed-point when conditions have dependency

Open bcl5980 opened this issue 2 years ago • 4 comments

https://alive2.llvm.org/ce/z/FCT5eJ

define void @src(i1 %a, i1 %b) {
entry:
  br i1 %a, label %a.true_b.false, label %a.false
a.false:                                          ; preds = %entry
  br i1 %b, label %b.true, label %if.end
b.true:                                           ; preds = %a.false
  br i1 %b, label %if.end, label %a.true_b.false
a.true_b.false:                                   ; preds = %b.true, %entry
  br i1 %a, label %if.end, label %if.then
if.then:                                          ; preds = %a.true_b.false
  call void @use()
  br label %if.end
if.end:                                           ; preds = %if.then, %a.true_b.false, %b.true, %a.false
  ret void
}

declare void @use()

First time SCCPSolver's log:

markOverdefined: i1 %a
markOverdefined: i1 %b

Popped off OI-WL: i1 %b

Popped off OI-WL: i1 %a
Marking Block Executable: a.true_b.false
Marking Block Executable: a.false

Popped off BBWL: 
a.false:                                          ; preds = %entry
  %b.0 = call i1 @llvm.ssa.copy.i1(i1 %b)
  br i1 %b, label %b.true, label %if.end

Merged constantrange<-1, 0> into   %b.0 = call i1 @llvm.ssa.copy.i1(i1 %b) : constantrange<-1, 0>
Marking Block Executable: b.true
Marking Block Executable: if.end

Popped off BBWL: 
if.end:                                           ; preds = %if.then, %a.true_b.false, %b.true, %a.false
  ret void


Popped off BBWL: 
b.true:                                           ; preds = %a.false
  br i1 %b.0, label %if.end, label %a.true_b.false

Marking Edge Executable: b.true -> if.end

Popped off BBWL: 
a.true_b.false:                                   ; preds = %b.true, %entry
  br i1 %a, label %if.end, label %if.then

Marking Edge Executable: a.true_b.false -> if.end
Marking Block Executable: if.then

Popped off BBWL: 
if.then:                                          ; preds = %a.true_b.false
  call void @use()
  br label %if.end

Marking Edge Executable: if.then -> if.end

Popped off BBWL: 
entry:
  br i1 %a, label %a.true_b.false, label %a.false


Popped off I-WL:   %b.0 = call i1 @llvm.ssa.copy.i1(i1 %b)

We need to run IPSCCP twice to eliminate the dead branch with function @use.

bcl5980 avatar Aug 17 '22 07:08 bcl5980

The issue here is that IPSCCP is using PredicateInfo to propagate conditional information. PredicateInfo only tracks information in blocks dominated by the true & false blocks of the branch. At construction time, neither a.false nor a.true_b.false dominate a.true_b.false, so we do not catch it.

fhahn avatar Aug 17 '22 07:08 fhahn

The issue here is that IPSCCP is using PredicateInfo to propagate conditional information. PredicateInfo only tracks information in blocks dominated by the true & false blocks of the branch. At construction time, neither a.false nor a.true_b.false dominate a.true_b.false, so we do not catch it.

Can we add a function to mark edge not executable when we propagate %b.0 = call i1 @llvm.ssa.copy.i1(i1 %b) to fix the issue? Or loop runIPSCCP until not changed?

bcl5980 avatar Aug 17 '22 07:08 bcl5980

I think the edge .true->a.true_b.false is already not marked as executable. This is the point where we could in theory connect the condition to the earlier one, but I am not sure how to do this in a scalable way.

Not sure if running IPSCCP is desirable, it is already quite expensive.

fhahn avatar Aug 17 '22 08:08 fhahn

It looks PredicateInfo doesn't support update dynamic also. No easy way to make it work.

bcl5980 avatar Aug 17 '22 08:08 bcl5980

Trying to add some test code here https://github.com/bcl5980/llvm-project/tree/perf/ipsccp-loop A lot of compile time increase but got a little performance gain on SPEC2017, so maybe we need to find some other way to improve.

    ipsccp-loop     base
500.perlbench_r 355.2289 4.4816169   365.9355 4.350494
502.gcc_r 247.9069 5.7118226   251.1451 5.638175
505.mcf_r 307.4327 5.2564351   314.2372 5.142612
520.omnetpp_r 489.193 2.6819681   517.4859 2.535335
523.xalancbmk_r 215.4812 4.9006594   220.6564 4.785721
525.x264_r 203.8385 8.5901359   204.6118 8.557669
531.deepsjeng_r 280.6433 4.083475   285.7085 4.011081
541.leela_r 438.2885 3.7783331   432.7863 3.826369
548.exchange2_r          
557.xz_r 314.2906 3.4363098   322.7374 3.346373
           
SPECrate2017_int_base   4.5313302     4.441562

bcl5980 avatar Aug 23 '22 12:08 bcl5980

Try loop runipsccp to a fix point: The loop count statistcs is here:

External/S...6.blender_r/526.blender_r.test 518.00
External/S...lancbmk_r/523.xalancbmk_r.test 230.00
External/S...lancbmk_s/623.xalancbmk_s.test 230.00
External/S...510.parest_r/510.parest_r.test 176.00
MultiSourc...marks/7zip/7zip-benchmark.test 157.00
External/S...0.omnetpp_s/620.omnetpp_s.test 153.00
External/S...0.omnetpp_r/520.omnetpp_r.test 153.00
External/S...8.imagick_s/638.imagick_s.test 74.00
External/S...8.imagick_r/538.imagick_r.test 74.00
MultiSourc...lications/ClamAV/clamscan.test 71.00
External/S...rlbench_s/600.perlbench_s.test 58.00
External/S...rlbench_r/500.perlbench_r.test 58.00
External/S...511.povray_r/511.povray_r.test 58.00
MultiSourc.../Applications/SPASS/SPASS.test 46.00
MultiSourc...ications/JM/lencod/lencod.test 45.00
MultiSourc.../Benchmarks/Bullet/bullet.test 41.00
MultiSourc...-typeset/consumer-typeset.test 40.00
MultiSourc...ocBench/espresso/espresso.test 36.00
MultiSourc...chmarks/MallocBench/gs/gs.test 35.00
MultiSourc...abench/jpeg/jpeg-6a/cjpeg.test 32.00
External/S...17speed/657.xz_s/657.xz_s.test 31.00
MultiSourc...nsumer-jpeg/consumer-jpeg.test 31.00
External/S...017rate/557.xz_r/557.xz_r.test 31.00
External/S...eed/625.x264_s/625.x264_s.test 28.00
External/S...ate/525.x264_r/525.x264_r.test 28.00
MultiSourc...CI_Purple/SMG2000/smg2000.test 27.00
MultiSourc...ProxyApps-C++/CLAMR/CLAMR.test 26.00
MultiSourc...ications/JM/ldecod/ldecod.test 21.00
MultiSourc...nsumer-lame/consumer-lame.test 20.00
MultiSourc...arks/mafft/pairlocalalign.test 19.00
MultiSource/Applications/lua/lua.test 18.00
MultiSourc...s/MallocBench/cfrac/cfrac.test 18.00
External/S...d/641.leela_s/641.leela_s.test 16.00
External/S...e/541.leela_r/541.leela_r.test 16.00
External/S...epsjeng_s/631.deepsjeng_s.test 12.00
MultiSourc...Applications/kimwitu++/kc.test 12.00
External/S...epsjeng_r/531.deepsjeng_r.test 12.00
External/S...speed/644.nab_s/644.nab_s.test 12.00
External/S...7rate/544.nab_r/544.nab_r.test 12.00
MultiSourc.../Prolangs-C/bison/mybison.test 11.00
MultiSourc...yApps-C++/PENNANT/PENNANT.test 11.00
MultiSourc...plications/d/make_dparser.test 10.00
MultiSourc...telecomm-gsm/telecomm-gsm.test 9.00
MultiSourc...ks/Prolangs-C/agrep/agrep.test 9.00
MultiSourc...lications/obsequi/Obsequi.test 9.00
External/S...ate/508.namd_r/508.namd_r.test 9.00
MultiSourc...ediabench/gsm/toast/toast.test 9.00
MultiSourc.../Applications/spiff/spiff.test 8.00
MultiSource/Applications/hbd/hbd.test 7.00
MultiSourc...ProxyApps-C++/HPCCG/HPCCG.test 7.00
MultiSourc...ks/Prolangs-C++/city/city.test 7.00
MultiSourc...peg2/mpeg2dec/mpeg2decode.test 7.00
MultiSourc...encode/alacconvert-encode.test 6.00
MultiSourc...s-C/Pathfinder/PathFinder.test 6.00
MultiSourc...oxyApps-C/miniAMR/miniAMR.test 6.00
MultiSourc...ch/g721/g721encode/encode.test 6.00
MicroBench...ALambdaLoops/lcalsALambda.test 6.00
MultiSourc...oxyApps-C++/miniFE/miniFE.test 6.00
MultiSourc...decode/alacconvert-decode.test 6.00
MicroBench...CLambdaLoops/lcalsCLambda.test 6.00
MultiSourc...arks/VersaBench/dbms/dbms.test 6.00
MicroBench...SubsetCRawLoops/lcalsCRaw.test 6.00
MultiSourc...cations/hexxagon/hexxagon.test 5.00
External/S...speed/605.mcf_s/605.mcf_s.test 5.00
MicroBench...SubsetARawLoops/lcalsARaw.test 5.00
MicroBench...SubsetBRawLoops/lcalsBRaw.test 5.00
External/S...7rate/505.mcf_r/505.mcf_r.test 5.00
MicroBench...BLambdaLoops/lcalsBLambda.test 5.00
MicroBench...eProcessing/Dilate/Dilate.test 4.00
MultiSourc...pps-C/SimpleMOC/SimpleMOC.test 4.00
MultiSourc...oxyApps-C/XSBench/XSBench.test 4.00
MicroBench...eProcessing/Dither/Dither.test 4.00
MultiSourc...marks/Ptrdist/yacr2/yacr2.test 4.00
MultiSourc...DOE-ProxyApps-C/CoMD/CoMD.test 4.00
MultiSourc.../Benchmarks/Ptrdist/bc/bc.test 4.00
MultiSourc...tions/lambda-0.1.3/lambda.test 4.00
MultiSourc...ks/Prolangs-C/gnugo/gnugo.test 4.00
MultiSourc...oxyApps-C/miniGMG/miniGMG.test 3.00
MultiSourc...count/automotive-bitcount.test 3.00
MultiSourc...nch/pcompress2/pcompress2.test 3.00
MultiSourc...oxyApps-C/RSBench/rsbench.test 3.00
MultiSourc.../Benchmarks/nbench/nbench.test 3.00
MultiSourc...s/ASC_Sequoia/AMGmk/AMGmk.test 3.00
MicroBench...sion/AnisotropicDiffusion.test 3.00
MicroBench...Filtering/BilateralFilter.test 3.00
MicroBench...ImageProcessing/Blur/blur.test 3.00
MultiSourc...lications/viterbi/viterbi.test 3.00
MicroBench...terpolation/Interpolation.test 3.00
MultiSource/Applications/siod/siod.test 3.00
MultiSourc.../Applications/sgefa/sgefa.test 3.00
MultiSourc...lications/SIBsim4/SIBsim4.test 3.00
MultiSourc...adpcm/rawcaudio/rawcaudio.test 2.00
MultiSourc.../Benchmarks/Olden/mst/mst.test 2.00
MultiSourc...Rodinia/backprop/backprop.test 2.00
MultiSourc...nchmarks/McCat/05-eks/eks.test 2.00
MultiSourc...cCat/03-testtrie/testtrie.test 2.00
MultiSourc...marks/SciMark2-C/scimark2.test 2.00
MultiSourc...nia/pathfinder/pathfinder.test 2.00
MultiSourc...s/Rodinia/hotspot/hotspot.test 2.00
MultiSourc...patricia/network-patricia.test 2.00
MultiSourc...rks/FreeBench/pifft/pifft.test 2.00
MultiSourc...lications/minisat/minisat.test 2.00
MultiSourc...lications/sqlite3/sqlite3.test 2.00
MicroBench...opVectorizationBenchmarks.test 2.00
MultiSourc...s/Fhourstones/fhourstones.test 2.00

  sccp.NumIPSccpRun

run ipsccp_loop count 225.000000
mean 13.795556
std 46.372839
min 1.000000
25% 1.000000
50% 1.000000
75% 6.000000
max 518.000000

It looks we have a lot of cases benefit from ipsccp-loop

bcl5980 avatar Aug 28 '22 04:08 bcl5980

Try loop runipsccp to a fix point: The loop count statistcs is here:

Is that the number of times we re-run IPSCCP? It may be interesting to see the difference in number of simplifications applied? Should be doable with the existing ipsccp stats.

fhahn avatar Aug 30 '22 12:08 fhahn

Try loop runipsccp to a fix point: The loop count statistcs is here:

Is that the number of times we re-run IPSCCP? It may be interesting to see the difference in number of simplifications applied? Should be doable with the existing ipsccp stats.

Yes, it is the number of times we re-run IPSCCP. Is this stat you want? IPNumInstRemoved.txt

bcl5980 avatar Aug 30 '22 13:08 bcl5980