alive2
alive2 copied to clipboard
Bug in unroller with nested loop
Reduced test case from @regehr:
define void @test() {
entry:
%i = alloca i32, align 4
%j = alloca i32, align 4
store i32 0, i32* %i, align 4
br label %for.body
for.body:
br label %for.body4
for.body4:
br label %for.inc
for.inc:
%n7 = load i32, i32* %j, align 4
%inc = add i32 %n7, 1
store i32 %inc, i32* %j, align 4
%cmp2 = icmp ult i32 %inc, 2
br i1 %cmp2, label %for.body4, label %for.inc5
for.inc5:
%n9 = load i32, i32* %i, align 4
%inc6 = add i32 %n9, 1
store i32 %inc6, i32* %i, align 4
%cmp = icmp ult i32 %inc6, 2
br i1 %cmp, label %for.body, label %for.end7
for.end7:
ret void
}
Equivalent C:
void test() {
for (unsigned i = 0; i < 2; i++) {
for (unsigned j = 0; j < 2; j++) { }
}
}
Repro:
$ opt -S -jump-threading bug.ll -o optimized.ll
$ alive-tv -src-unroll=2 -tgt-unroll=2 bug.ll optimized.ll
ERROR: The target program doesn't reach a return instruction.
I have asked @zhengyang92 to look into this one
To eliminate the unnecessary jumps, two nested loops may share a common header. In this reduced example, the jump-threading pass turns the loops that have separate headers into loops sharing one common header. The Havlak's algorithm we're using recognizes such CFG as a single loop rather than two, and this causes the missed unroll of the inner loop. The LLVM loop analysis passes have similar problems. These passes first canonicalizes the loops back to loops using separated headers and then performs the analysis.
@nunoplopes What should we do to handle such cases? I guess we need to implement a loop canonicalizer that runs before the unroller.
@zhengyang92 I guess the canonicalization code you're talking about is in Transforms/Utils/LoopSimplify.cpp
?
do you have a sense for how hard it will be to do this in alive?
I guess we need to reimplement part of function in alive2 https://llvm.org/doxygen/LoopSimplify_8cpp_source.html#l00220. After it is implemented alive2 will be able to tell that these are two loops instead of one. However I am not completely sure if this particular canonicalization rewrite will fix all mis-identified loops or we need a few other canonicalizations. I'll take a look at other cases triggered by Vsevolod's fuzzer and check if there's any other loops that cannot be handled by our analyzer.
Good catch! One of the Havlak's papers has algorithms to pre-process loops. I forget the details. But it's true that Alive2's loop algorithms need to be as powerful as LLVM's. So we do need some pre-processing to avoid these alarms. At least it doesn't trigger false-positives. Need to recheck John's example though, as I'm not sure anymore this reduced example exhibits the same issue.
Another case from LLVM's test suite:
./opt-alive-test.sh ~/llvm/llvm/test/Transforms/LoopSimplifyCFG/constant-fold-branch.ll -p=loop-simplifycfg -tv-src-unroll=2 -tv-tgt-unroll=2
At least 2 phi nodes have wrong predecessors:
%backedge:
br i1 %cmp, label %header#1#2, label %outer_backedge
%header#2:
%i#2 = phi i32 [ 0, %preheader#2 ], [ %i.inc, %backedge ]
br i1 1, label %backedge#2, label %dead#2
%backedge#1#2#2:
br i1 %cmp#1#2#2, label %header#exit#2, label %outer_backedge#2
%outer_backedge#2:
%i.inc.lcssa#2 = phi i32 [ %i.inc, %backedge ], [ %i.inc#1#2, %backedge#1#2 ]