llvm-project
llvm-project copied to clipboard
Particular switch optimises poorly for Arm vs. x86-64
Upstream issue: https://github.com/rust-lang/rust/issues/98157.
LLVM doesn't optimise the referenced example for Arm (or possibly non-x86_64 in general), contrasted against the code generation emit for x86_64. Quickly tested with "i686-unknown-linux-gnu," and "x86_64-unknown-linux-gnu," which elude the switch table, vs. "arm-unknown-linux-gnueabi," "armv7-unknown-linux-gnueabi," and "aarch64-unknown-linux-gnu" which keep the table.
This doesn't seem to inherently be a rustc issue, as it emits the same unoptimised LLVM IR for either platform, the only relevant part of which seems to be as follows:
@0 = private unnamed_addr constant <{ [4 x i8] }> <{ [4 x i8] c"\0C\0D\0E\0F" }>, align 1
@1 = private unnamed_addr constant <{ [4 x i8] }> <{ [4 x i8] c"\08\09\0A\0B" }>, align 1
@2 = private unnamed_addr constant <{ [4 x i8] }> <{ [4 x i8] c"\04\05\06\07" }>, align 1
@3 = private unnamed_addr constant <{ [4 x i8] }> <{ [4 x i8] c"\00\01\02\03" }>, align 1
; ...
; Function Attrs: nonlazybind uwtable
define i32 @f(i8 %n) unnamed_addr #1 !dbg !14 {
start:
%0 = alloca [4 x i8], align 1
%_2 = urem i8 %n, 4, !dbg !17
switch i8 %_2, label %bb1 [
i8 0, label %bb2
i8 1, label %bb3
i8 2, label %bb4
i8 3, label %bb5
], !dbg !18
I suspect it has something to with i8 being a legal type on X86 and not on many other targets.
Right before SimplifyCFG the X86 version of the switch is using i8 types, but other targets are using i2. https://godbolt.org/z/P8Tojs8jf Maybe this messed up SwitchLookupTable::SwitchLookupTable in SimplifyCFG?
Yes, the type seems to make the difference: https://llvm.godbolt.org/z/6E9Mqz3of
@llvm/issue-subscribers-backend-aarch64
Candidate patch: https://reviews.llvm.org/D135982