rust-legacy-fork
rust-legacy-fork copied to clipboard
Switch lookup tables are not properly placed in program memory
The input code here is a slight tweak of https://github.com/avr-rust/rust/issues/46, but the generated code is very different because it uses a lookup instead of bit-twiddling:
Rust:
#[inline(never)]
pub fn decode(n: u8) -> Option<()> {
match n {
0x0 => Some(()),
0x1 => Some(()),
0x2 => Some(()),
0x3 => Some(()),
0x4 => Some(()),
0xe => Some(()),
_ => None
}
}
LLVM IR:
.text
.file "external-05dffe7586b7d437.ll"
.globl _ZN8external6decode17h0bb113af5b7a3594E
.p2align 1
.type _ZN8external6decode17h0bb113af5b7a3594E,@function
_ZN8external6decode17h0bb113af5b7a3594E: ; @_ZN8external6decode17h0bb113af5b7a3594E
; BB#0: ; %start
cpi r24, 15
brlo .+2
rjmp LBB0_1
; BB#2: ; %switch.lookup
mov r26, r24
mov r27, r24
lsl r27
sbc r27, r27
subi r26, -lo8(switch.table)
sbci r27, -hi8(switch.table)
ld r24, X
ret
LBB0_1:
ldi r24, 0
ret
.Lfunc_end0:
.size _ZN8external6decode17h0bb113af5b7a3594E, .Lfunc_end0-_ZN8external6decode17h0bb113af5b7a3594E
.type switch.table,@object ; @switch.table
.section .rodata,"a",@progbits
switch.table:
.ascii "\001\001\001\001\001\000\000\000\000\000\000\000\000\000\001"
.size switch.table, 15
AVR assembly:
Contents of section .text:
0000 8f3008f0 00c0a82f b82fbb0f bb0ba050 .0....././.....P
0010 b0408c91 089580e0 0895 .@........
Contents of section .rodata:
0000 01010101 01000000 00000000 000001 ...............
Disassembly of section .text:
00000000 <_ZN8external6decode17h0bb113af5b7a3594E>:
0: 8f 30 cpi r24, 0x0F ; 15
2: 08 f0 brcs .+2 ; 0x6 <_ZN8external6decode17h0bb113af5b7a3594E+0x6>
4: 00 c0 rjmp .+0 ; 0x6 <_ZN8external6decode17h0bb113af5b7a3594E+0x6>
4: R_AVR_13_PCREL .text+0x16
6: a8 2f mov r26, r24
8: b8 2f mov r27, r24
a: bb 0f add r27, r27
c: bb 0b sbc r27, r27
e: a0 50 subi r26, 0x00 ; 0
e: R_AVR_LO8_LDI_NEG .rodata
10: b0 40 sbci r27, 0x00 ; 0
10: R_AVR_HI8_LDI_NEG .rodata
12: 8c 91 ld r24, X
14: 08 95 ret
00000016 <LBB0_1>:
16: 80 e0 ldi r24, 0x00 ; 0
18: 08 95 ret
In the fully linked .elf
, the table address in subi
/sbci
seems wrong to me:
Contents of section .data:
800100 01010101 01000000 00000000 00000100 ................
Disassembly of section .text:
...
000000b4 <_ZN8external6decode17hbbb21fc0c1cdbaf3E>:
b4: 8f 30 cpi r24, 0x0F ; 15
b6: 08 f0 brcs .+2 ; 0xba <_ZN8external6decode17hbbb21fc0c1cdbaf3E+0x6>
b8: 08 c0 rjmp .+16 ; 0xca <LBB0_1>
ba: a8 2f mov r26, r24
bc: b8 2f mov r27, r24
be: bb 0f add r27, r27
c0: bb 0b sbc r27, r27
c2: a0 50 subi r26, 0x00 ; 0
c4: bf 4f sbci r27, 0xFF ; 255
c6: 8c 91 ld r24, X
c8: 08 95 ret
Should these lookup tables even be generated?! https://github.com/avr-llvm/llvm/issues/88
Using a linker script, I moved the lookup table into .text
. The linked binary now looks more promising:
Sections:
Idx Name Size VMA LMA File off Algn
0 .data 00000000 00800100 000000e1 00000155 2**0
CONTENTS, ALLOC, LOAD, DATA
1 .text 000000e1 00000000 00000000 00000074 2**1
CONTENTS, ALLOC, LOAD, READONLY, CODE
2 .stab 000006cc 00000000 00000000 00000158 2**2
CONTENTS, READONLY, DEBUGGING
3 .stabstr 00000054 00000000 00000000 00000824 2**0
CONTENTS, READONLY, DEBUGGING
Disassembly of section .text:
000000b4 <_ZN8external6decode17hbbb21fc0c1cdbaf3E>:
b4: 8f 30 cpi r24, 0x0F ; 15
b6: 08 f0 brcs .+2 ; 0xba <_ZN8external6decode17hbb
b21fc0c1cdbaf3E+0x6>
b8: 08 c0 rjmp .+16 ; 0xca <LBB0_1>
ba: a8 2f mov r26, r24
bc: b8 2f mov r27, r24
be: bb 0f add r27, r27
c0: bb 0b sbc r27, r27
c2: ae 52 subi r26, 0x2E ; 46
c4: bf 4f sbci r27, 0xFF ; 255
c6: 8c 91 ld r24, X
c8: 08 95 ret
000000d2 <_etext>:
d2: 01 01 movw r0, r2
d4: 01 01 movw r0, r2
d6: 01 00 .word 0x0001 ; ????
...
e0: 01 02 Address 0x00000000000000e0 is out of bounds.
.word 0xffff ; ????
Where 0xff2e from 0xc2/0xc4 is exactly -0xe0, so this seems like it should work.
Unfortunately, decode(0x03)
is still returning None
.
Details of decode
going wrong with an input of 0x03
in r24
. Note that r26
and r27
seem to point to the right .text
address, but the ld
still results in 0.
Breakpoint 1, 0x000000b4 in external::decode::hbbb21fc0c1cdbaf3 ()
=> 0x000000b4 <_ZN8external6decode17hbbb21fc0c1cdbaf3E+0>: 8f 30 cpi r24, 0x0F ; 15
=> 0x000000b6 <_ZN8external6decode17hbbb21fc0c1cdbaf3E+2>: 08 f0 brcs .+2 ; 0xba
=> 0x000000ba <_ZN8external6decode17hbbb21fc0c1cdbaf3E+6>: a8 2f mov r26, r24
=> 0x000000bc <_ZN8external6decode17hbbb21fc0c1cdbaf3E+8>: b8 2f mov r27, r24
=> 0x000000be <_ZN8external6decode17hbbb21fc0c1cdbaf3E+10>: bb 0f add r27, r27
=> 0x000000c0 <_ZN8external6decode17hbbb21fc0c1cdbaf3E+12>: bb 0b sbc r27, r27
=> 0x000000c2 <_ZN8external6decode17hbbb21fc0c1cdbaf3E+14>: ae 52 subi r26, 0x2E ; 46
=> 0x000000c4 <_ZN8external6decode17hbbb21fc0c1cdbaf3E+16>: bf 4f sbci r27, 0xFF ; 255
=> 0x000000c6 <_ZN8external6decode17hbbb21fc0c1cdbaf3E+18>: 8c 91 ld r24, X
r26 0xd5 213
r27 0x0 0
=> 0x000000c8 <_ZN8external6decode17hbbb21fc0c1cdbaf3E+20>: 08 95 ret
(gdb) i r r24
r24 0x0 0
My working theory now is that the .text
version is the one that should work, but we should be using LPM
instead of LD
.
I've fixed this locally by matching load
IR instructions with LPMRdZ
and linking .rodata
into .text
with a linking script.
This works for the test case from this ticket, but unfortunately doesn't readily generalize to 16-bit load
s, since the pseudo-instruction LPMWRdZ
is not implemented. Next step: see if I can copy-paste LDWRdPtr
's implementation.
LLVM IR was not actually attached in the description so here is what I get when I compile it locally
; ModuleID = 'test.cgu-0.rs'
source_filename = "test.cgu-0.rs"
target datalayout = "e-p:16:16:16-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-n8"
target triple = "avr-atmel-none"
%"lib::option::Option<()>" = type { i8, [0 x i8], [0 x i8] }
; Function Attrs: noinline uwtable
define internal i8 @_ZN4test6decode17hcd01465f37da56a3E(i8) unnamed_addr #0 {
start:
%_0 = alloca %"lib::option::Option<()>"
switch i8 %0, label %bb7 [
i8 0, label %bb1
i8 1, label %bb2
i8 2, label %bb3
i8 3, label %bb4
i8 4, label %bb5
i8 14, label %bb6
]
bb1: ; preds = %start
%1 = getelementptr inbounds %"lib::option::Option<()>", %"lib::option::Option<()>"* %_0, i32 0, i32 0
store i8 1, i8* %1
br label %bb8
bb2: ; preds = %start
%2 = getelementptr inbounds %"lib::option::Option<()>", %"lib::option::Option<()>"* %_0, i32 0, i32 0
store i8 1, i8* %2
br label %bb8
bb3: ; preds = %start
%3 = getelementptr inbounds %"lib::option::Option<()>", %"lib::option::Option<()>"* %_0, i32 0, i32 0
store i8 1, i8* %3
br label %bb8
bb4: ; preds = %start
%4 = getelementptr inbounds %"lib::option::Option<()>", %"lib::option::Option<()>"* %_0, i32 0, i32 0
store i8 1, i8* %4
br label %bb8
bb5: ; preds = %start
%5 = getelementptr inbounds %"lib::option::Option<()>", %"lib::option::Option<()>"* %_0, i32 0, i32 0
store i8 1, i8* %5
br label %bb8
bb6: ; preds = %start
%6 = getelementptr inbounds %"lib::option::Option<()>", %"lib::option::Option<()>"* %_0, i32 0, i32 0
store i8 1, i8* %6
br label %bb8
bb7: ; preds = %start
%7 = getelementptr inbounds %"lib::option::Option<()>", %"lib::option::Option<()>"* %_0, i32 0, i32 0
store i8 0, i8* %7
br label %bb8
bb8: ; preds = %bb1, %bb2, %bb3, %bb4, %bb5, %bb6, %bb7
%8 = bitcast %"lib::option::Option<()>"* %_0 to i8*
%9 = load i8, i8* %8, align 1
ret i8 %9
}
; Function Attrs: nounwind uwtable
define void @rust_eh_personality() unnamed_addr #1 {
start:
ret void
}
; Function Attrs: uwtable
define internal void @_ZN4test4main17hf4d700377696cbb0E() unnamed_addr #2 {
start:
ret void
}
; Function Attrs: uwtable
define internal i16 @_ZN4test5start17h518035a39d5f5ca0E(i16, i8**) unnamed_addr #2 {
start:
ret i16 0
}
define i16 @main(i16, i8**) unnamed_addr {
top:
%2 = call i16 @_ZN4test5start17h518035a39d5f5ca0E(i16 %0, i8** %1)
ret i16 %2
}
attributes #0 = { noinline uwtable }
attributes #1 = { nounwind uwtable }
attributes #2 = { uwtable }
!llvm.module.flags = !{!0}
!0 = !{i32 1, !"PIE Level", i32 2}
Continuing conversation from #53
It's interesting because it looks like LLVM is making the switch -> lookup table optimisation.
That means that in order to correctly mark the global variable with the program memory address space, we will need to modify LLVM.
This completely untested/uncompiled hack would probably fix it
diff --git a/lib/Transforms/Utils/SimplifyCFG.cpp b/lib/Transforms/Utils/SimplifyCFG.cpp
index 7b0bddbbb83..d0506c2ef17 100644
--- a/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -4973,7 +4973,8 @@ SwitchLookupTable::SwitchLookupTable(
Array = new GlobalVariable(M, ArrayTy, /*constant=*/true,
GlobalVariable::PrivateLinkage, Initializer,
- "switch.table");
+ "switch.table", /*InsertBefore=*/ nullptr,
+ GlobalValue::NotThreadLocal, /*AddressSpace=*/1);
Array->setUnnamedAddr(GlobalValue::UnnamedAddr::Global);
Kind = ArrayKind;
}
Of course, if we wanted to upstream it, it would have to be more general purpose.
I can think of two solutions
- Add a new function to
TargetMachineInfo
that just creates aGlobalVariable
given all of the required arguments. Instead of building theGlobalVariable
here, we could call the new function. We can then override this function specifically for AVR and set the address space there. - Add a custom pass that runs after
simplify-cfg
that enumerates all global variables, inspecting if they containSwitchTable
objects, and sets the appropriate address space. This solution sucks.
I've been looking into fixing this properly via adding target-specific hooks to the TargetTransformInfo
class.
I've been thinking about it and I don't think these lookup tables are a special case.
The switch table itself is just a global variable, i.e.
@switch.table.foo = private unnamed_addr constant [15 x i8] c"\01\01\01\01\01\00\00\00\00\00\00\00\00\00\01"
Now, there is nothing special about this - it is just an array of characters. In general, we should always be able to load things from RAM using the ld
family of instructions.
Now, at chip startup, RAM is completely empty and all variables are stored in program memory. What should happen is that the startup code copies all variables that should exist in RAM, into the RAM. Now this should also include the switch table, which shouldn't work any different.
Now if we use ld
to load a switch table in data space (address space #0, the default) and use it to lookup a value, it should work fine given that the lookup table is correctly copied to RAM.
Granted, we really would want switch tables to exist in program memory because it's just static branching data. I will finish writing my patch because it's good anyway, but I think there must be something deeper afoot.
Here is where I'm getting my information, questions 1-3.
Opened D34983.
I am 100% percent against any solution that involves mindlessly copying switch lookup tables into RAM just so we can use ld
to load them.
These lookup tables are generated by the compiler. On MCUs that have 512 bytes of RAM or even less, having them in RAM is definitely observable behaviour -- an observable leak of compiler behaviour.
We have everything we need to know, end to end, to avoid this copying: the Rust compiler synthesizes these globals and it also generates the LLVM IR for accessing them. This cannot be an unsolvable problem.
I completely agree, having lookup tables copied into ram would be a very ugly wart.
My point is that I can fix switch tables but if we currently cannot load those globals, are we even loading globals correctly now?
My patch completely fixes the switch table issue by the way, so they they only exist and are correctly loaded from program memory.
are we even loading globals correctly now?
No, I don't think we are, but I think that responsibility is outside of the compiler proper. I have some old hacked up code locally that adds some linker things to get the size of the global section and then copies it out into RAM before Rust's main
starts. I'm pretty sure that GCCs startup files do the same thing.
Do you know if this functionality is a part of the CRT? If so, that means we will get it for free if we link correctly with avr-gcc's CRT library.
Do you know if this functionality is a part of the CRT?
I think so, but to be honest I don't fully recall and I'm not even sure I have a great distinction of what is and is not the CRT.
if we link correctly with avr-gcc's CRT library.
That would be the trick, wouldn't it? :-)
In case it wasn't clear, I really want to write as much of that in Rust as possible, so I'm always trying to remove any GCC bits ;-)
I think there are a few unresolved questions with switch tables:
- Switch tables cannot be placed in memory on MCUs without the load program memory (
lpm
) instruction - Therefore, if we want switch lookup tables in program memory, we must still support loading from them from RAM
I think the easiest way forward is to disable switch lookup tables on AVR until these two problems are resolved (#47 and #69).
I have a WIP branch here.