alive2 Bug in unroller with nested loop

Reduced test case from @regehr:

define void @test() {
entry:
  %i = alloca i32, align 4
  %j = alloca i32, align 4
  store i32 0, i32* %i, align 4
  br label %for.body

for.body:
  br label %for.body4

for.body4:
  br label %for.inc

for.inc:
  %n7 = load i32, i32* %j, align 4
  %inc = add i32 %n7, 1
  store i32 %inc, i32* %j, align 4
  %cmp2 = icmp ult i32 %inc, 2
  br i1 %cmp2, label %for.body4, label %for.inc5

for.inc5:
  %n9 = load i32, i32* %i, align 4
  %inc6 = add i32 %n9, 1
  store i32 %inc6, i32* %i, align 4
  %cmp = icmp ult i32 %inc6, 2
  br i1 %cmp, label %for.body, label %for.end7

for.end7:
  ret void
}

Equivalent C:

void test() {
  for (unsigned i = 0; i < 2; i++) {
    for (unsigned j = 0; j < 2; j++) { }
  }
}

Repro:

$ opt -S -jump-threading bug.ll -o optimized.ll
$ alive-tv -src-unroll=2 -tgt-unroll=2 bug.ll optimized.ll

ERROR: The target program doesn't reach a return instruction.

Sep 20 '21 19:09 nunoplopes

I have asked @zhengyang92 to look into this one

Sep 20 '21 19:09 regehr

To eliminate the unnecessary jumps, two nested loops may share a common header. In this reduced example, the jump-threading pass turns the loops that have separate headers into loops sharing one common header. The Havlak's algorithm we're using recognizes such CFG as a single loop rather than two, and this causes the missed unroll of the inner loop. The LLVM loop analysis passes have similar problems. These passes first canonicalizes the loops back to loops using separated headers and then performs the analysis.

@nunoplopes What should we do to handle such cases? I guess we need to implement a loop canonicalizer that runs before the unroller.

Sep 21 '21 20:09 zhengyang92

@zhengyang92 I guess the canonicalization code you're talking about is in Transforms/Utils/LoopSimplify.cpp? do you have a sense for how hard it will be to do this in alive?

Sep 21 '21 22:09 regehr

I guess we need to reimplement part of function in alive2 https://llvm.org/doxygen/LoopSimplify_8cpp_source.html#l00220. After it is implemented alive2 will be able to tell that these are two loops instead of one. However I am not completely sure if this particular canonicalization rewrite will fix all mis-identified loops or we need a few other canonicalizations. I'll take a look at other cases triggered by Vsevolod's fuzzer and check if there's any other loops that cannot be handled by our analyzer.

Sep 22 '21 03:09 zhengyang92

Good catch! One of the Havlak's papers has algorithms to pre-process loops. I forget the details. But it's true that Alive2's loop algorithms need to be as powerful as LLVM's. So we do need some pre-processing to avoid these alarms. At least it doesn't trigger false-positives. Need to recheck John's example though, as I'm not sure anymore this reduced example exhibits the same issue.

Sep 22 '21 12:09 nunoplopes

Another case from LLVM's test suite:

./opt-alive-test.sh ~/llvm/llvm/test/Transforms/LoopSimplifyCFG/constant-fold-branch.ll -p=loop-simplifycfg -tv-src-unroll=2 -tv-tgt-unroll=2

At least 2 phi nodes have wrong predecessors:

%backedge:
  br i1 %cmp, label %header#1#2, label %outer_backedge

%header#2:
  %i#2 = phi i32 [ 0, %preheader#2 ], [ %i.inc, %backedge ]
  br i1 1, label %backedge#2, label %dead#2


%backedge#1#2#2:
  br i1 %cmp#1#2#2, label %header#exit#2, label %outer_backedge#2

%outer_backedge#2:
  %i.inc.lcssa#2 = phi i32 [ %i.inc, %backedge ], [ %i.inc#1#2, %backedge#1#2 ]

Jan 05 '23 13:01 nunoplopes

alive2 alive2 copied to clipboard

Bug in unroller with nested loop

alive2
alive2 copied to clipboard