rust-legacy-fork
rust-legacy-fork copied to clipboard
[LLVM 6.0] flt2dec / dec2flt cause assertion failed: Wrong topological sorting
This is with my recently-rebased branch based on upstream Rust 190a6c. I've added the patch from #99 locally.
Case 1
; ModuleID = 'bugpoint-reduced-simplified.bc'
source_filename = "bugpoint-output-b213db3.bc"
target datalayout = "e-p:16:8-i8:8-i16:8-i32:8-i64:8-f32:8-f64:8-n8-a:8"
target triple = "avr-unknown-unknown"
; Function Attrs: uwtable
define align 1 void @core__num__flt2dec__strategy__dragon__mul_pow() unnamed_addr {
start:
%ret.i = alloca [40 x i32], align 1
%0 = zext i32 undef to i64
%1 = add nuw nsw i16 0, 11
br label %bb19.i105.11
bb19.i105.11: ; preds = %start
%2 = getelementptr inbounds [40 x i32], [40 x i32]* %ret.i, i16 0, i16 %1
%3 = load i32, i32* %2, align 1
%4 = mul nuw i64 %0, 2788392729
%5 = zext i32 %3 to i64
%6 = add nuw nsw i64 0, %5
%7 = add i64 %6, %4
%8 = trunc i64 %7 to i32
store i32 %8, i32* %2, align 1
unreachable
}
Case 2
; ModuleID = 'bugpoint-reduced-simplified.bc'
source_filename = "bugpoint-output-95356ba.bc"
target datalayout = "e-p:16:8-i8:8-i16:8-i32:8-i64:8-f32:8-f64:8-n8-a:8"
target triple = "avr-unknown-unknown"
%"num::bignum::Big32x40" = type { [0 x i8], i16, [0 x i8], [40 x i32], [0 x i8] }
; Function Attrs: uwtable
define void @"core::num::dec2flt::num::digits_to_big"() unnamed_addr {
start:
%_7.sroa.0.0..sroa_idx.i = getelementptr inbounds %"num::bignum::Big32x40", %"num::bignum::Big32x40"* undef, i16 0, i32 3, i16 0
switch i2 undef, label %bb5.i10 [
i2 0, label %bb2.i
i2 1, label %bb3.i
i2 -2, label %bb4.i
]
bb2.i: ; preds = %start
ret void
bb3.i: ; preds = %start
%0 = add i8 0, -48
%1 = zext i8 %0 to i32
%2 = load i32, i32* %_7.sroa.0.0..sroa_idx.i, align 1
%3 = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %2, i32 %1)
%4 = extractvalue { i32, i1 } %3, 0
store i32 %4, i32* %_7.sroa.0.0..sroa_idx.i, align 1
unreachable
bb4.i: ; preds = %start
unreachable
bb5.i10: ; preds = %start
unreachable
}
; Function Attrs: nounwind readnone speculatable
declare { i32, i1 } @llvm.uadd.with.overflow.i32(i32, i32)
Error
Assertion failed: (Node2Index[SU.NodeNum] > Node2Index[PD.getSUnit()->NodeNum] && "Wrong topological sorting"), function InitDAGTopologicalSorting, file /Users/shep/Projects/avr-rust/src/llvm/lib/CodeGen/ScheduleDAG.cpp, line 518.
/cc @brainlag
Since I got my bugpoint setup working and fixed all my paths, I can now report a new cra-- oh. :smile:
One piece of addition information: I did not apply #99 and I still got this crash in exactly the same place.
I've been looking into this issue a bit (with a slightly different repro since I'm using a newer LLVM version). The issue appears to be that whatever data is passed to ScheduleDAGTopologicalSort
, is not a DAG. The example where the assertion happens contains a cycle.
Note that this seems to be related to optimizations. Compiling with llc -O0
is fine, llc -O1
is not.
In case that helps anyone, here is what I believe is the relevant part of the -debug
output: https://gist.github.com/620098d70297404f3d02a8555363b916
The "LLVM ERROR: out of memory" seems easily explainable to me: LLVM apparently wants to compute the depth of the "DAG" for debug printing. Since the DAG contains a cycle, we get an infinite loop.
If you compile llvm with asserts you will hit that and don't run into OOM error. I debugged this in the past and IIRC the problem is because 2 nodes in the DAG are glued together which lead to the cycle. If these 2 nodes would not glued together it would be fine.
the problem is because 2 nodes in the DAG are glued together
I'm not quite sure what you mean with "glued together". In my case there is a cycle between three nodes (2, 3 and 6 in the Graph below).
Graph
There are glue nodes which glues 2 (or more?) nodes together While the IR code does not have a cycle, llvm wrongly fuses 2 instructions into 1 node which leads to a cycle in the DAG.
Ah, I wasn't aware that "glue nodes" are an LLVM concept. Anyway, I think I understand now what you meant and see it for my example as well. So, IIUC, the next step would probably be figuring out why the two nodes are glued together.
In any case, I'm including the relevant graphs below, in case they are useful to anyone:
ISel Input
Sched Input
Scheduled
Looking at some more graphs, I believe that the issue occurs during the combine2
phase:
Before, we already have the glue node, but not yet a potential cycle. After combine2
we now have t40
and the potential cycle.
Before Combine 2
After Combine 2
I think I know now what happens: In CombineToPostIndexedLoadStore
, LLVM checks if t21
and t35
can be combined.
This may only be done if
Op must be independent of N, i.e. Op is neither a predecessor nor a successor of N. Otherwise, if Op is folded that would create a cycle.
The check for "!Op->isPredecessorOf(N) && !N->isPredecessorOf(Op)" then "correctly" identifies that they are independent, if one considers only the edges actually present in the graph.
The issue now is that the glue link between t28
and t29
will in effect be bidirectional (rather than unidirectional) once the two nodes are combined.
I can see three solutions to this:
- This could be considered a bug in
CombineToPostIndexedLoadStore
, thusCombineToPostIndexedLoadStore
should be adapted to handle this case. - Not generate the glue in the first place. (If it only exists for cosmetic / performance reasons, it could just be dropped. If it must be present, then I believe the glue would need to be replaced by a pseudo / virtual instruction).
- Ignore the glue in the schedule when it would lead to a cycle.
Intuitively, I don't think that (1) is feasible and don't like (3) as a solution. I'll try to figure out why / how / where the glue is inserted in the first place.
Edit: Any thoughts on the three proposed solutions?
The glue nodes are generated by LLVM in DAGTypeLegalizer::ExpandIntRes_ADDSUB
. Given that, I don't believe that solution (2) is easily feasible anymore, since any changes here would affect much more than just AVR.
Imho, and i might be wrong, this is flaw in llvm. Either way I think you should discuss this with the llvm developers.
Also, when I looked up the code for CombineToPostIndexedLoadStore
I found this.
Yeah, I think you are correct that this can be considered an LLVM bug. I'll write to the mailing list or open a bug tomorrow.
Also, when I looked up the code for
CombineToPostIndexedLoadStore
I found this.
That commit should be a non-functional change. It just ensures that Op->isPredecessorOf(N)
and N->isPredecessorOf(Op)
don't do duplicate work.
llvm-dev thread (or rather, initial message): http://lists.llvm.org/pipermail/llvm-dev/2018-October/126851.html
Per the first reply on the ML, I've tried to disable the (add|sub)(c|e)
instructions for AVR (patch) and that makes my reproduction compile.
I don't know yet if that breaks something else. Worstcase, I hope that this only makes things slower, which could be fixed by adding support for the (ADD|SUB)CARRY
instructions.
Live-Edit: I've just ran the AVR CodeGen tests and took a closer look at add.ll
. As far as I can tell, the generated assembly is much worse (for comparison, the assembly without the patch).
And someone has a proposed fix for the glue
We've recently upgraded to LLVM 8 🎉 I'm going to close any bug that is reported against an older version of LLVM. If you are still having this issue with the LLVM-8 based code, please ping me and I can reopen the issue!
@shepmaster: This has a not-yet-upstreamed patch on the 8.0 branch, so I think we should keep the issue open?
Built the official 8.0 release code with AVR target enabled, llced the code in OP, and it seems llc
falls into a dead loop.
The way forward here is to convert the AVR backend from the old ADDE/ADDC/SUBE/SUBC
DAG opcodes to the UADDO/ADDCARRY/USUBO/SUBCARRY
DAG opcodes. The old ones have been deprecated for a while.
https://reviews.llvm.org/D53106#1443059
Summarizing offline discussion with jyknight for future reference:
This should only be necessary due to glued nodes that are not Copy{To,From}Reg and which have additional predecessors/successors than nodes on the glue. This appear to only occur in the case of ADDC/ADDE nodes have been deprecated.
This patch was originally written for such an instance in the AVR backend.
Hm, I can't reproduce this on LLVM trunk anymore, on either testcase.
; ModuleID = 'bugpoint-reduced-simplified.bc'
source_filename = "bugpoint-output-95356ba.bc"
target datalayout = "e-p:16:8-i8:8-i16:8-i32:8-i64:8-f32:8-f64:8-n8-a:8"
target triple = "avr-unknown-unknown"
%"num::bignum::Big32x40" = type { [0 x i8], i16, [0 x i8], [40 x i32], [0 x i8] }
; Function Attrs: uwtable
define void @"core::num::dec2flt::num::digits_to_big"() unnamed_addr {
start:
%_7.sroa.0.0..sroa_idx.i = getelementptr inbounds %"num::bignum::Big32x40", %"num::bignum::Big32x40"* undef, i16 0, i32 3, i16 0
switch i2 undef, label %bb5.i10 [
i2 0, label %bb2.i
i2 1, label %bb3.i
i2 -2, label %bb4.i
]
bb2.i: ; preds = %start
ret void
bb3.i: ; preds = %start
%0 = add i8 0, -48
%1 = zext i8 %0 to i32
%2 = load i32, i32* %_7.sroa.0.0..sroa_idx.i, align 1
%3 = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %2, i32 %1)
%4 = extractvalue { i32, i1 } %3, 0
store i32 %4, i32* %_7.sroa.0.0..sroa_idx.i, align 1
unreachable
bb4.i: ; preds = %start
unreachable
bb5.i10: ; preds = %start
unreachable
}
; Function Attrs: nounwind readnone speculatable
declare { i32, i1 } @llvm.uadd.with.overflow.i32(i32, i32)
Using avr-rust at avr-support-aa35e73b
./llc -march=avr -o - -filetype=asm ~/projects/builds/llvm-project/llvm/foo.ll
Gives
.text
.file "bugpoint-output-95356ba.bc"
llc: /home/dylan/projects/avr-rust/rust/src/llvm/lib/CodeGen/ScheduleDAG.cpp:507: void llvm::ScheduleDAGTopologicalSort::InitDAGTopologicalSorting(): Assertion `Node2Index[SU.NodeNum] > Node
2Index[PD.getSUnit()->NodeNum] && "Wrong topological sorting"' failed.
Stack dump:
0. Program arguments: ./llc -march=avr -o - -filetype=asm /home/dylan/projects/builds/llvm-project/llvm/foo.ll
1. Running pass 'Function Pass Manager' on module '/home/dylan/projects/builds/llvm-project/llvm/foo.ll'.
2. Running pass 'AVR DAG->DAG Instruction Selection' on function '@"core::num::dec2flt::num::digits_to_big"'
#0 0x00007f5841e1dd9a llvm::sys::PrintStackTrace(llvm::raw_ostream&) /home/dylan/projects/avr-rust/rust/src/llvm/lib/Support/Unix/Signals.inc:499:3
#1 0x00007f5841e1c234 llvm::sys::RunSignalHandlers() /home/dylan/projects/avr-rust/rust/src/llvm/lib/Support/Signals.cpp:67:5
#2 0x00007f5841e1c3b5 SignalHandler(int) /home/dylan/projects/avr-rust/rust/src/llvm/lib/Support/Unix/Signals.inc:348:31
#3 0x00007f58414a44d0 __restore_rt (/usr/lib/libpthread.so.0+0x124d0)
#4 0x00007f584101482f __GI_raise (/usr/lib/libc.so.6+0x3782f)
#5 0x00007f5840fff672 __GI_abort (/usr/lib/libc.so.6+0x22672)
#6 0x00007f5840fff548 _nl_load_domain.cold.0 (/usr/lib/libc.so.6+0x22548)
#7 0x00007f584100cdb6 (/usr/lib/libc.so.6+0x2fdb6)
#8 0x00007f58422c3deb llvm::ScheduleDAGTopologicalSort::InitDAGTopologicalSorting() /home/dylan/projects/avr-rust/rust/src/llvm/lib/CodeGen/ScheduleDAG.cpp:506:7
#9 0x00007f58424b1d75 (anonymous namespace)::ScheduleDAGRRList::Schedule() /home/dylan/projects/avr-rust/rust/src/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp:371:3
Using llvm trunk at 3a987cdf3
(r362773
)
./bin/llc -march=avr -o - -filetype=asm foo.ll
Gives
.text
.file "bugpoint-output-95356ba.bc"
.globl "core::num::dec2flt::num::digits_to_big" ; -- Begin function core::num::dec2flt::num::digits_to_big
.p2align 1
.type "core::num::dec2flt::num::digits_to_big",@function
"core::num::dec2flt::num::digits_to_big": ; @"core::num::dec2flt::num::digits_to_big"
; %bb.0: ; %start
ldi r24, 1
cpi r24, 0
breq LBB0_2
; %bb.1: ; %bb2.i
ret
LBB0_2: ; %start
ldi r24, 0
cpi r24, 0
brne LBB0_4
; %bb.3: ; %bb4.i
LBB0_4: ; %bb3.i
ld r24, Z
ldd r25, Z+1
ldd r18, Z+2
ldd r19, Z+3
subi r24, 48
sbci r25, 255
sbci r18, 255
sbci r19, 255
st Z, r24
std Z+1, r25
std Z+2, r18
std Z+3, r19
.Lfunc_end0:
.size "core::num::dec2flt::num::digits_to_big", .Lfunc_end0-"core::num::dec2flt::num::digits_to_big"
; -- End function
; Declaring this symbol tells the CRT that it should
;copy all variables from program memory to RAM on startup
.globl __do_copy_data
; Declaring this symbol tells the CRT that it should
;clear the zeroed data section on startup
.globl __do_clear_bss
So perhaps an LLVM upgrade will fix this?
I cherry-picked avr-rust/llvm-project@56d676b08b956051de3f781e7afd73365d62a7a9 and avr-rust/llvm-project@a5b647806b43a505da004f54e19b41476121c98b into the avr-support-aa35e73b
branch (#137), and this test no longer fails.