rust-legacy-fork icon indicating copy to clipboard operation
rust-legacy-fork copied to clipboard

[LLVM 6.0] flt2dec / dec2flt cause assertion failed: Wrong topological sorting

Open shepmaster opened this issue 6 years ago • 22 comments

This is with my recently-rebased branch based on upstream Rust 190a6c. I've added the patch from #99 locally.

Case 1

; ModuleID = 'bugpoint-reduced-simplified.bc'
source_filename = "bugpoint-output-b213db3.bc"
target datalayout = "e-p:16:8-i8:8-i16:8-i32:8-i64:8-f32:8-f64:8-n8-a:8"
target triple = "avr-unknown-unknown"

; Function Attrs: uwtable
define align 1 void @core__num__flt2dec__strategy__dragon__mul_pow() unnamed_addr  {
start:
  %ret.i = alloca [40 x i32], align 1
  %0 = zext i32 undef to i64
  %1 = add nuw nsw i16 0, 11
  br label %bb19.i105.11

bb19.i105.11:                                     ; preds = %start
  %2 = getelementptr inbounds [40 x i32], [40 x i32]* %ret.i, i16 0, i16 %1
  %3 = load i32, i32* %2, align 1
  %4 = mul nuw i64 %0, 2788392729
  %5 = zext i32 %3 to i64
  %6 = add nuw nsw i64 0, %5
  %7 = add i64 %6, %4
  %8 = trunc i64 %7 to i32
  store i32 %8, i32* %2, align 1
  unreachable
}

Case 2

; ModuleID = 'bugpoint-reduced-simplified.bc'
source_filename = "bugpoint-output-95356ba.bc"
target datalayout = "e-p:16:8-i8:8-i16:8-i32:8-i64:8-f32:8-f64:8-n8-a:8"
target triple = "avr-unknown-unknown"

%"num::bignum::Big32x40" = type { [0 x i8], i16, [0 x i8], [40 x i32], [0 x i8] }

; Function Attrs: uwtable
define void @"core::num::dec2flt::num::digits_to_big"() unnamed_addr {
start:
  %_7.sroa.0.0..sroa_idx.i = getelementptr inbounds %"num::bignum::Big32x40", %"num::bignum::Big32x40"* undef, i16 0, i32 3, i16 0
  switch i2 undef, label %bb5.i10 [
    i2 0, label %bb2.i
    i2 1, label %bb3.i
    i2 -2, label %bb4.i
  ]

bb2.i:                                            ; preds = %start
  ret void

bb3.i:                                            ; preds = %start
  %0 = add i8 0, -48
  %1 = zext i8 %0 to i32
  %2 = load i32, i32* %_7.sroa.0.0..sroa_idx.i, align 1
  %3 = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %2, i32 %1)
  %4 = extractvalue { i32, i1 } %3, 0
  store i32 %4, i32* %_7.sroa.0.0..sroa_idx.i, align 1
  unreachable

bb4.i:                                            ; preds = %start
  unreachable

bb5.i10:                                          ; preds = %start
  unreachable
}

; Function Attrs: nounwind readnone speculatable
declare { i32, i1 } @llvm.uadd.with.overflow.i32(i32, i32)

Error

Assertion failed: (Node2Index[SU.NodeNum] > Node2Index[PD.getSUnit()->NodeNum] && "Wrong topological sorting"), function InitDAGTopologicalSorting, file /Users/shep/Projects/avr-rust/src/llvm/lib/CodeGen/ScheduleDAG.cpp, line 518.

shepmaster avatar May 03 '18 02:05 shepmaster

/cc @brainlag

shepmaster avatar May 03 '18 02:05 shepmaster

Since I got my bugpoint setup working and fixed all my paths, I can now report a new cra-- oh. :smile:

One piece of addition information: I did not apply #99 and I still got this crash in exactly the same place.

jhwgh1968 avatar May 03 '18 03:05 jhwgh1968

I've been looking into this issue a bit (with a slightly different repro since I'm using a newer LLVM version). The issue appears to be that whatever data is passed to ScheduleDAGTopologicalSort, is not a DAG. The example where the assertion happens contains a cycle.

Note that this seems to be related to optimizations. Compiling with llc -O0 is fine, llc -O1 is not.

TimNN avatar Oct 08 '18 20:10 TimNN

In case that helps anyone, here is what I believe is the relevant part of the -debug output: https://gist.github.com/620098d70297404f3d02a8555363b916

The "LLVM ERROR: out of memory" seems easily explainable to me: LLVM apparently wants to compute the depth of the "DAG" for debug printing. Since the DAG contains a cycle, we get an infinite loop.

TimNN avatar Oct 08 '18 20:10 TimNN

If you compile llvm with asserts you will hit that and don't run into OOM error. I debugged this in the past and IIRC the problem is because 2 nodes in the DAG are glued together which lead to the cycle. If these 2 nodes would not glued together it would be fine.

brainlag avatar Oct 08 '18 21:10 brainlag

the problem is because 2 nodes in the DAG are glued together

I'm not quite sure what you mean with "glued together". In my case there is a cycle between three nodes (2, 3 and 6 in the Graph below).

Graph

graph

TimNN avatar Oct 08 '18 21:10 TimNN

There are glue nodes which glues 2 (or more?) nodes together While the IR code does not have a cycle, llvm wrongly fuses 2 instructions into 1 node which leads to a cycle in the DAG.

brainlag avatar Oct 08 '18 21:10 brainlag

Ah, I wasn't aware that "glue nodes" are an LLVM concept. Anyway, I think I understand now what you meant and see it for my example as well. So, IIUC, the next step would probably be figuring out why the two nodes are glued together.

In any case, I'm including the relevant graphs below, in case they are useful to anyone:

ISel Input

01-isel-input

Sched Input

02-sched-input

Scheduled

03-scheduled

TimNN avatar Oct 09 '18 19:10 TimNN

Looking at some more graphs, I believe that the issue occurs during the combine2 phase:

Before, we already have the glue node, but not yet a potential cycle. After combine2 we now have t40 and the potential cycle.

Before Combine 2

04-before-combine-2

After Combine 2

04-after-combine-2

TimNN avatar Oct 09 '18 19:10 TimNN

I think I know now what happens: In CombineToPostIndexedLoadStore, LLVM checks if t21 and t35 can be combined.

This may only be done if

Op must be independent of N, i.e. Op is neither a predecessor nor a successor of N. Otherwise, if Op is folded that would create a cycle.

The check for "!Op->isPredecessorOf(N) && !N->isPredecessorOf(Op)" then "correctly" identifies that they are independent, if one considers only the edges actually present in the graph.

The issue now is that the glue link between t28 and t29 will in effect be bidirectional (rather than unidirectional) once the two nodes are combined.

I can see three solutions to this:

  1. This could be considered a bug in CombineToPostIndexedLoadStore, thus CombineToPostIndexedLoadStore should be adapted to handle this case.
  2. Not generate the glue in the first place. (If it only exists for cosmetic / performance reasons, it could just be dropped. If it must be present, then I believe the glue would need to be replaced by a pseudo / virtual instruction).
  3. Ignore the glue in the schedule when it would lead to a cycle.

Intuitively, I don't think that (1) is feasible and don't like (3) as a solution. I'll try to figure out why / how / where the glue is inserted in the first place.

Edit: Any thoughts on the three proposed solutions?

TimNN avatar Oct 09 '18 20:10 TimNN

The glue nodes are generated by LLVM in DAGTypeLegalizer::ExpandIntRes_ADDSUB. Given that, I don't believe that solution (2) is easily feasible anymore, since any changes here would affect much more than just AVR.

TimNN avatar Oct 09 '18 20:10 TimNN

Imho, and i might be wrong, this is flaw in llvm. Either way I think you should discuss this with the llvm developers.

Also, when I looked up the code for CombineToPostIndexedLoadStore I found this.

brainlag avatar Oct 09 '18 20:10 brainlag

Yeah, I think you are correct that this can be considered an LLVM bug. I'll write to the mailing list or open a bug tomorrow.

Also, when I looked up the code for CombineToPostIndexedLoadStore I found this.

That commit should be a non-functional change. It just ensures that Op->isPredecessorOf(N) and N->isPredecessorOf(Op) don't do duplicate work.

TimNN avatar Oct 09 '18 20:10 TimNN

llvm-dev thread (or rather, initial message): http://lists.llvm.org/pipermail/llvm-dev/2018-October/126851.html

TimNN avatar Oct 10 '18 15:10 TimNN

Per the first reply on the ML, I've tried to disable the (add|sub)(c|e) instructions for AVR (patch) and that makes my reproduction compile.

I don't know yet if that breaks something else. Worstcase, I hope that this only makes things slower, which could be fixed by adding support for the (ADD|SUB)CARRY instructions.

Live-Edit: I've just ran the AVR CodeGen tests and took a closer look at add.ll. As far as I can tell, the generated assembly is much worse (for comparison, the assembly without the patch).

TimNN avatar Oct 10 '18 20:10 TimNN

And someone has a proposed fix for the glue

shepmaster avatar Oct 10 '18 21:10 shepmaster

We've recently upgraded to LLVM 8 🎉 I'm going to close any bug that is reported against an older version of LLVM. If you are still having this issue with the LLVM-8 based code, please ping me and I can reopen the issue!

shepmaster avatar Nov 03 '18 01:11 shepmaster

@shepmaster: This has a not-yet-upstreamed patch on the 8.0 branch, so I think we should keep the issue open?

TimNN avatar Nov 03 '18 08:11 TimNN

Built the official 8.0 release code with AVR target enabled, llced the code in OP, and it seems llc falls into a dead loop.

crlf0710 avatar Apr 14 '19 12:04 crlf0710

The way forward here is to convert the AVR backend from the old ADDE/ADDC/SUBE/SUBC DAG opcodes to the UADDO/ADDCARRY/USUBO/SUBCARRY DAG opcodes. The old ones have been deprecated for a while.

https://reviews.llvm.org/D53106#1443059

Summarizing offline discussion with jyknight for future reference:

This should only be necessary due to glued nodes that are not Copy{To,From}Reg and which have additional predecessors/successors than nodes on the glue. This appear to only occur in the case of ADDC/ADDE nodes have been deprecated.

This patch was originally written for such an instance in the AVR backend.

dylanmckay avatar Jun 07 '19 07:06 dylanmckay

Hm, I can't reproduce this on LLVM trunk anymore, on either testcase.

; ModuleID = 'bugpoint-reduced-simplified.bc'
source_filename = "bugpoint-output-95356ba.bc"
target datalayout = "e-p:16:8-i8:8-i16:8-i32:8-i64:8-f32:8-f64:8-n8-a:8"
target triple = "avr-unknown-unknown"

%"num::bignum::Big32x40" = type { [0 x i8], i16, [0 x i8], [40 x i32], [0 x i8] }

; Function Attrs: uwtable
define void @"core::num::dec2flt::num::digits_to_big"() unnamed_addr {
start:
  %_7.sroa.0.0..sroa_idx.i = getelementptr inbounds %"num::bignum::Big32x40", %"num::bignum::Big32x40"* undef, i16 0, i32 3, i16 0
  switch i2 undef, label %bb5.i10 [
    i2 0, label %bb2.i
    i2 1, label %bb3.i
    i2 -2, label %bb4.i
  ]

bb2.i:                                            ; preds = %start
  ret void

bb3.i:                                            ; preds = %start
  %0 = add i8 0, -48
  %1 = zext i8 %0 to i32
  %2 = load i32, i32* %_7.sroa.0.0..sroa_idx.i, align 1
  %3 = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %2, i32 %1)
  %4 = extractvalue { i32, i1 } %3, 0
  store i32 %4, i32* %_7.sroa.0.0..sroa_idx.i, align 1
  unreachable

bb4.i:                                            ; preds = %start
  unreachable

bb5.i10:                                          ; preds = %start
  unreachable
}

; Function Attrs: nounwind readnone speculatable
declare { i32, i1 } @llvm.uadd.with.overflow.i32(i32, i32)

Using avr-rust at avr-support-aa35e73b

./llc -march=avr -o - -filetype=asm ~/projects/builds/llvm-project/llvm/foo.ll

Gives

        .text                                                                                                                                                                                 
        .file   "bugpoint-output-95356ba.bc"                                                                                                                                                  
llc: /home/dylan/projects/avr-rust/rust/src/llvm/lib/CodeGen/ScheduleDAG.cpp:507: void llvm::ScheduleDAGTopologicalSort::InitDAGTopologicalSorting(): Assertion `Node2Index[SU.NodeNum] > Node
2Index[PD.getSUnit()->NodeNum] && "Wrong topological sorting"' failed.                                                                                                                        
Stack dump:                                                                                                                                                                                   
0.      Program arguments: ./llc -march=avr -o - -filetype=asm /home/dylan/projects/builds/llvm-project/llvm/foo.ll                                                                           
1.      Running pass 'Function Pass Manager' on module '/home/dylan/projects/builds/llvm-project/llvm/foo.ll'.                                                                                
2.      Running pass 'AVR DAG->DAG Instruction Selection' on function '@"core::num::dec2flt::num::digits_to_big"'                                                                             
#0 0x00007f5841e1dd9a llvm::sys::PrintStackTrace(llvm::raw_ostream&) /home/dylan/projects/avr-rust/rust/src/llvm/lib/Support/Unix/Signals.inc:499:3                                           
#1 0x00007f5841e1c234 llvm::sys::RunSignalHandlers() /home/dylan/projects/avr-rust/rust/src/llvm/lib/Support/Signals.cpp:67:5                                                                 
#2 0x00007f5841e1c3b5 SignalHandler(int) /home/dylan/projects/avr-rust/rust/src/llvm/lib/Support/Unix/Signals.inc:348:31                                                                      
#3 0x00007f58414a44d0 __restore_rt (/usr/lib/libpthread.so.0+0x124d0)                                                                                                                         
#4 0x00007f584101482f __GI_raise (/usr/lib/libc.so.6+0x3782f)
#5 0x00007f5840fff672 __GI_abort (/usr/lib/libc.so.6+0x22672)
#6 0x00007f5840fff548 _nl_load_domain.cold.0 (/usr/lib/libc.so.6+0x22548)
#7 0x00007f584100cdb6 (/usr/lib/libc.so.6+0x2fdb6)
#8 0x00007f58422c3deb llvm::ScheduleDAGTopologicalSort::InitDAGTopologicalSorting() /home/dylan/projects/avr-rust/rust/src/llvm/lib/CodeGen/ScheduleDAG.cpp:506:7
#9 0x00007f58424b1d75 (anonymous namespace)::ScheduleDAGRRList::Schedule() /home/dylan/projects/avr-rust/rust/src/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp:371:3

Using llvm trunk at 3a987cdf3 (r362773)

./bin/llc -march=avr -o - -filetype=asm  foo.ll 

Gives

        .text
        .file   "bugpoint-output-95356ba.bc"
        .globl  "core::num::dec2flt::num::digits_to_big" ; -- Begin function core::num::dec2flt::num::digits_to_big
        .p2align        1
        .type   "core::num::dec2flt::num::digits_to_big",@function
"core::num::dec2flt::num::digits_to_big": ; @"core::num::dec2flt::num::digits_to_big"
; %bb.0:                                ; %start
        ldi     r24, 1
        cpi     r24, 0
        breq    LBB0_2
; %bb.1:                                ; %bb2.i
        ret
LBB0_2:                                 ; %start
        ldi     r24, 0
        cpi     r24, 0
        brne    LBB0_4
; %bb.3:                                ; %bb4.i
LBB0_4:                                 ; %bb3.i
        ld      r24, Z
        ldd     r25, Z+1
        ldd     r18, Z+2
        ldd     r19, Z+3
        subi    r24, 48
        sbci    r25, 255
        sbci    r18, 255
        sbci    r19, 255
        st      Z, r24
        std     Z+1, r25
        std     Z+2, r18
        std     Z+3, r19
.Lfunc_end0:
        .size   "core::num::dec2flt::num::digits_to_big", .Lfunc_end0-"core::num::dec2flt::num::digits_to_big"
                                        ; -- End function

        ; Declaring this symbol tells the CRT that it should
        ;copy all variables from program memory to RAM on startup
        .globl  __do_copy_data
        ; Declaring this symbol tells the CRT that it should
        ;clear the zeroed data section on startup
        .globl  __do_clear_bss

So perhaps an LLVM upgrade will fix this?

dylanmckay avatar Jun 07 '19 07:06 dylanmckay

I cherry-picked avr-rust/llvm-project@56d676b08b956051de3f781e7afd73365d62a7a9 and avr-rust/llvm-project@a5b647806b43a505da004f54e19b41476121c98b into the avr-support-aa35e73b branch (#137), and this test no longer fails.

dylanmckay avatar Jun 08 '19 14:06 dylanmckay