rust-legacy-fork icon indicating copy to clipboard operation
rust-legacy-fork copied to clipboard

Switch lookup tables are not properly placed in program memory

Open gergoerdi opened this issue 7 years ago • 18 comments

The input code here is a slight tweak of https://github.com/avr-rust/rust/issues/46, but the generated code is very different because it uses a lookup instead of bit-twiddling:

Rust:

#[inline(never)]
pub fn decode(n: u8) -> Option<()> {
    match n {
        0x0 => Some(()),
        0x1 => Some(()),
        0x2 => Some(()),
        0x3 => Some(()),
        0x4 => Some(()),
        0xe => Some(()),
        _ => None
    }
}

LLVM IR:

        .text
        .file   "external-05dffe7586b7d437.ll"
        .globl  _ZN8external6decode17h0bb113af5b7a3594E
        .p2align        1
        .type   _ZN8external6decode17h0bb113af5b7a3594E,@function
_ZN8external6decode17h0bb113af5b7a3594E: ; @_ZN8external6decode17h0bb113af5b7a3594E
; BB#0:                                 ; %start
        cpi     r24, 15
        brlo    .+2
        rjmp    LBB0_1
; BB#2:                                 ; %switch.lookup
        mov     r26, r24
        mov     r27, r24
        lsl     r27
        sbc     r27, r27
        subi    r26, -lo8(switch.table)
        sbci    r27, -hi8(switch.table)
        ld      r24, X
        ret
LBB0_1:
        ldi     r24, 0
        ret
.Lfunc_end0:
        .size   _ZN8external6decode17h0bb113af5b7a3594E, .Lfunc_end0-_ZN8external6decode17h0bb113af5b7a3594E

        .type   switch.table,@object    ; @switch.table
        .section        .rodata,"a",@progbits
switch.table:
        .ascii  "\001\001\001\001\001\000\000\000\000\000\000\000\000\000\001"
        .size   switch.table, 15

AVR assembly:

Contents of section .text:
 0000 8f3008f0 00c0a82f b82fbb0f bb0ba050  .0....././.....P
 0010 b0408c91 089580e0 0895               .@........      
Contents of section .rodata:
 0000 01010101 01000000 00000000 000001    ............... 

Disassembly of section .text:

00000000 <_ZN8external6decode17h0bb113af5b7a3594E>:
   0:   8f 30           cpi     r24, 0x0F       ; 15
   2:   08 f0           brcs    .+2             ; 0x6 <_ZN8external6decode17h0bb113af5b7a3594E+0x6>
   4:   00 c0           rjmp    .+0             ; 0x6 <_ZN8external6decode17h0bb113af5b7a3594E+0x6>
                        4: R_AVR_13_PCREL       .text+0x16
   6:   a8 2f           mov     r26, r24
   8:   b8 2f           mov     r27, r24
   a:   bb 0f           add     r27, r27
   c:   bb 0b           sbc     r27, r27
   e:   a0 50           subi    r26, 0x00       ; 0
                        e: R_AVR_LO8_LDI_NEG    .rodata
  10:   b0 40           sbci    r27, 0x00       ; 0
                        10: R_AVR_HI8_LDI_NEG   .rodata
  12:   8c 91           ld      r24, X
  14:   08 95           ret

00000016 <LBB0_1>:
  16:   80 e0           ldi     r24, 0x00       ; 0
  18:   08 95           ret

gergoerdi avatar May 06 '17 07:05 gergoerdi

In the fully linked .elf, the table address in subi/sbci seems wrong to me:

Contents of section .data:
 800100 01010101 01000000 00000000 00000100  ................

Disassembly of section .text:
...
000000b4 <_ZN8external6decode17hbbb21fc0c1cdbaf3E>:
  b4:   8f 30           cpi     r24, 0x0F       ; 15
  b6:   08 f0           brcs    .+2             ; 0xba <_ZN8external6decode17hbbb21fc0c1cdbaf3E+0x6>
  b8:   08 c0           rjmp    .+16            ; 0xca <LBB0_1>
  ba:   a8 2f           mov     r26, r24
  bc:   b8 2f           mov     r27, r24
  be:   bb 0f           add     r27, r27
  c0:   bb 0b           sbc     r27, r27
  c2:   a0 50           subi    r26, 0x00       ; 0
  c4:   bf 4f           sbci    r27, 0xFF       ; 255
  c6:   8c 91           ld      r24, X
  c8:   08 95           ret

gergoerdi avatar May 06 '17 08:05 gergoerdi

Should these lookup tables even be generated?! https://github.com/avr-llvm/llvm/issues/88

gergoerdi avatar May 06 '17 09:05 gergoerdi

Using a linker script, I moved the lookup table into .text. The linked binary now looks more promising:

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .data         00000000  00800100  000000e1  00000155  2**0
                  CONTENTS, ALLOC, LOAD, DATA
  1 .text         000000e1  00000000  00000000  00000074  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  2 .stab         000006cc  00000000  00000000  00000158  2**2
                  CONTENTS, READONLY, DEBUGGING
  3 .stabstr      00000054  00000000  00000000  00000824  2**0
                  CONTENTS, READONLY, DEBUGGING
Disassembly of section .text:

000000b4 <_ZN8external6decode17hbbb21fc0c1cdbaf3E>:
  b4:   8f 30           cpi     r24, 0x0F       ; 15
  b6:   08 f0           brcs    .+2             ; 0xba <_ZN8external6decode17hbb
b21fc0c1cdbaf3E+0x6>
  b8:   08 c0           rjmp    .+16            ; 0xca <LBB0_1>
  ba:   a8 2f           mov     r26, r24
  bc:   b8 2f           mov     r27, r24
  be:   bb 0f           add     r27, r27
  c0:   bb 0b           sbc     r27, r27
  c2:   ae 52           subi    r26, 0x2E       ; 46
  c4:   bf 4f           sbci    r27, 0xFF       ; 255
  c6:   8c 91           ld      r24, X
  c8:   08 95           ret

000000d2 <_etext>:
  d2:   01 01           movw    r0, r2
  d4:   01 01           movw    r0, r2
  d6:   01 00           .word   0x0001  ; ????
        ...
  e0:   01 02           Address 0x00000000000000e0 is out of bounds.
.word   0xffff  ; ????

Where 0xff2e from 0xc2/0xc4 is exactly -0xe0, so this seems like it should work.

Unfortunately, decode(0x03) is still returning None.

gergoerdi avatar May 06 '17 13:05 gergoerdi

Details of decode going wrong with an input of 0x03 in r24. Note that r26 and r27 seem to point to the right .text address, but the ld still results in 0.

Breakpoint 1, 0x000000b4 in external::decode::hbbb21fc0c1cdbaf3 ()
=> 0x000000b4 <_ZN8external6decode17hbbb21fc0c1cdbaf3E+0>:	8f 30	cpi	r24, 0x0F	; 15
=> 0x000000b6 <_ZN8external6decode17hbbb21fc0c1cdbaf3E+2>:	08 f0	brcs	.+2      	;  0xba
=> 0x000000ba <_ZN8external6decode17hbbb21fc0c1cdbaf3E+6>:	a8 2f	mov	r26, r24
=> 0x000000bc <_ZN8external6decode17hbbb21fc0c1cdbaf3E+8>:	b8 2f	mov	r27, r24
=> 0x000000be <_ZN8external6decode17hbbb21fc0c1cdbaf3E+10>:	bb 0f	add	r27, r27
=> 0x000000c0 <_ZN8external6decode17hbbb21fc0c1cdbaf3E+12>:	bb 0b	sbc	r27, r27
=> 0x000000c2 <_ZN8external6decode17hbbb21fc0c1cdbaf3E+14>:	ae 52	subi	r26, 0x2E	; 46
=> 0x000000c4 <_ZN8external6decode17hbbb21fc0c1cdbaf3E+16>:	bf 4f	sbci	r27, 0xFF	; 255
=> 0x000000c6 <_ZN8external6decode17hbbb21fc0c1cdbaf3E+18>:	8c 91	ld	r24, X
r26            0xd5	213
r27            0x0	0
=> 0x000000c8 <_ZN8external6decode17hbbb21fc0c1cdbaf3E+20>:	08 95	ret
(gdb) i r r24
r24            0x0	0

gergoerdi avatar May 06 '17 13:05 gergoerdi

My working theory now is that the .text version is the one that should work, but we should be using LPM instead of LD.

gergoerdi avatar May 06 '17 13:05 gergoerdi

I've fixed this locally by matching load IR instructions with LPMRdZ and linking .rodata into .text with a linking script.

This works for the test case from this ticket, but unfortunately doesn't readily generalize to 16-bit loads, since the pseudo-instruction LPMWRdZ is not implemented. Next step: see if I can copy-paste LDWRdPtr's implementation.

gergoerdi avatar May 07 '17 04:05 gergoerdi

LLVM IR was not actually attached in the description so here is what I get when I compile it locally

; ModuleID = 'test.cgu-0.rs'
source_filename = "test.cgu-0.rs"
target datalayout = "e-p:16:16:16-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-n8"
target triple = "avr-atmel-none"

%"lib::option::Option<()>" = type { i8, [0 x i8], [0 x i8] }

; Function Attrs: noinline uwtable
define internal i8 @_ZN4test6decode17hcd01465f37da56a3E(i8) unnamed_addr #0 {
start:
  %_0 = alloca %"lib::option::Option<()>"
  switch i8 %0, label %bb7 [
    i8 0, label %bb1
    i8 1, label %bb2
    i8 2, label %bb3
    i8 3, label %bb4
    i8 4, label %bb5
    i8 14, label %bb6
  ]

bb1:                                              ; preds = %start
  %1 = getelementptr inbounds %"lib::option::Option<()>", %"lib::option::Option<()>"* %_0, i32 0, i32 0
  store i8 1, i8* %1
  br label %bb8

bb2:                                              ; preds = %start
  %2 = getelementptr inbounds %"lib::option::Option<()>", %"lib::option::Option<()>"* %_0, i32 0, i32 0
  store i8 1, i8* %2
  br label %bb8

bb3:                                              ; preds = %start
  %3 = getelementptr inbounds %"lib::option::Option<()>", %"lib::option::Option<()>"* %_0, i32 0, i32 0
  store i8 1, i8* %3
  br label %bb8

bb4:                                              ; preds = %start
  %4 = getelementptr inbounds %"lib::option::Option<()>", %"lib::option::Option<()>"* %_0, i32 0, i32 0
  store i8 1, i8* %4
  br label %bb8

bb5:                                              ; preds = %start
  %5 = getelementptr inbounds %"lib::option::Option<()>", %"lib::option::Option<()>"* %_0, i32 0, i32 0
  store i8 1, i8* %5
  br label %bb8

bb6:                                              ; preds = %start
  %6 = getelementptr inbounds %"lib::option::Option<()>", %"lib::option::Option<()>"* %_0, i32 0, i32 0
  store i8 1, i8* %6
  br label %bb8

bb7:                                              ; preds = %start
  %7 = getelementptr inbounds %"lib::option::Option<()>", %"lib::option::Option<()>"* %_0, i32 0, i32 0
  store i8 0, i8* %7
  br label %bb8

bb8:                                              ; preds = %bb1, %bb2, %bb3, %bb4, %bb5, %bb6, %bb7
  %8 = bitcast %"lib::option::Option<()>"* %_0 to i8*
  %9 = load i8, i8* %8, align 1
  ret i8 %9
}

; Function Attrs: nounwind uwtable
define void @rust_eh_personality() unnamed_addr #1 {
start:
  ret void
}

; Function Attrs: uwtable
define internal void @_ZN4test4main17hf4d700377696cbb0E() unnamed_addr #2 {
start:
  ret void
}

; Function Attrs: uwtable
define internal i16 @_ZN4test5start17h518035a39d5f5ca0E(i16, i8**) unnamed_addr #2 {
start:
  ret i16 0
}

define i16 @main(i16, i8**) unnamed_addr {
top:
  %2 = call i16 @_ZN4test5start17h518035a39d5f5ca0E(i16 %0, i8** %1)
  ret i16 %2
}

attributes #0 = { noinline uwtable }
attributes #1 = { nounwind uwtable }
attributes #2 = { uwtable }

!llvm.module.flags = !{!0}

!0 = !{i32 1, !"PIE Level", i32 2}

dylanmckay avatar May 13 '17 09:05 dylanmckay

Continuing conversation from #53

It's interesting because it looks like LLVM is making the switch -> lookup table optimisation.

That means that in order to correctly mark the global variable with the program memory address space, we will need to modify LLVM.

dylanmckay avatar May 13 '17 09:05 dylanmckay

This completely untested/uncompiled hack would probably fix it

diff --git a/lib/Transforms/Utils/SimplifyCFG.cpp b/lib/Transforms/Utils/SimplifyCFG.cpp
index 7b0bddbbb83..d0506c2ef17 100644
--- a/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -4973,7 +4973,8 @@ SwitchLookupTable::SwitchLookupTable(
 
   Array = new GlobalVariable(M, ArrayTy, /*constant=*/true,
                              GlobalVariable::PrivateLinkage, Initializer,
-                             "switch.table");
+                             "switch.table", /*InsertBefore=*/ nullptr,
+                             GlobalValue::NotThreadLocal, /*AddressSpace=*/1);
   Array->setUnnamedAddr(GlobalValue::UnnamedAddr::Global);
   Kind = ArrayKind;
 }

Of course, if we wanted to upstream it, it would have to be more general purpose.

I can think of two solutions

  • Add a new function to TargetMachineInfo that just creates a GlobalVariable given all of the required arguments. Instead of building the GlobalVariable here, we could call the new function. We can then override this function specifically for AVR and set the address space there.
  • Add a custom pass that runs after simplify-cfgthat enumerates all global variables, inspecting if they contain SwitchTable objects, and sets the appropriate address space. This solution sucks.

dylanmckay avatar May 13 '17 09:05 dylanmckay

I've been looking into fixing this properly via adding target-specific hooks to the TargetTransformInfo class.

I've been thinking about it and I don't think these lookup tables are a special case.

The switch table itself is just a global variable, i.e.

@switch.table.foo = private unnamed_addr constant [15 x i8] c"\01\01\01\01\01\00\00\00\00\00\00\00\00\00\01"

Now, there is nothing special about this - it is just an array of characters. In general, we should always be able to load things from RAM using the ld family of instructions.

Now, at chip startup, RAM is completely empty and all variables are stored in program memory. What should happen is that the startup code copies all variables that should exist in RAM, into the RAM. Now this should also include the switch table, which shouldn't work any different.

Now if we use ld to load a switch table in data space (address space #0, the default) and use it to lookup a value, it should work fine given that the lookup table is correctly copied to RAM.

Granted, we really would want switch tables to exist in program memory because it's just static branching data. I will finish writing my patch because it's good anyway, but I think there must be something deeper afoot.

Here is where I'm getting my information, questions 1-3.

dylanmckay avatar Jul 04 '17 14:07 dylanmckay

Opened D34983.

dylanmckay avatar Jul 04 '17 14:07 dylanmckay

I am 100% percent against any solution that involves mindlessly copying switch lookup tables into RAM just so we can use ld to load them.

These lookup tables are generated by the compiler. On MCUs that have 512 bytes of RAM or even less, having them in RAM is definitely observable behaviour -- an observable leak of compiler behaviour.

We have everything we need to know, end to end, to avoid this copying: the Rust compiler synthesizes these globals and it also generates the LLVM IR for accessing them. This cannot be an unsolvable problem.

gergoerdi avatar Jul 04 '17 15:07 gergoerdi

I completely agree, having lookup tables copied into ram would be a very ugly wart.

My point is that I can fix switch tables but if we currently cannot load those globals, are we even loading globals correctly now?

My patch completely fixes the switch table issue by the way, so they they only exist and are correctly loaded from program memory.

dylanmckay avatar Jul 04 '17 15:07 dylanmckay

are we even loading globals correctly now?

No, I don't think we are, but I think that responsibility is outside of the compiler proper. I have some old hacked up code locally that adds some linker things to get the size of the global section and then copies it out into RAM before Rust's main starts. I'm pretty sure that GCCs startup files do the same thing.

shepmaster avatar Jul 05 '17 23:07 shepmaster

Do you know if this functionality is a part of the CRT? If so, that means we will get it for free if we link correctly with avr-gcc's CRT library.

dylanmckay avatar Jul 06 '17 07:07 dylanmckay

Do you know if this functionality is a part of the CRT?

I think so, but to be honest I don't fully recall and I'm not even sure I have a great distinction of what is and is not the CRT.

if we link correctly with avr-gcc's CRT library.

That would be the trick, wouldn't it? :-)

In case it wasn't clear, I really want to write as much of that in Rust as possible, so I'm always trying to remove any GCC bits ;-)

shepmaster avatar Jul 08 '17 18:07 shepmaster

I think there are a few unresolved questions with switch tables:

  • Switch tables cannot be placed in memory on MCUs without the load program memory (lpm) instruction
  • Therefore, if we want switch lookup tables in program memory, we must still support loading from them from RAM

I think the easiest way forward is to disable switch lookup tables on AVR until these two problems are resolved (#47 and #69).

dylanmckay avatar Jan 21 '19 05:01 dylanmckay

I have a WIP branch here.

dylanmckay avatar Jan 21 '19 05:01 dylanmckay