Odin
Odin copied to clipboard
Codegen error with AVX-512BW
Context
Odin: dev-2024-10-nightly
OS: Arch Linux, Linux 6.11.2-zen1-1-zen
CPU: AMD Ryzen 9 9950X 16-Core Processor
RAM: 61886 MiB
Backend: LLVM 18.1.6
Expected Behavior
The code in the snippet below should compile without issues, and should execute without issues if AVX-512BW is available on the machine.
Current Behavior
When building the code in the snippet below (and other similarly-constructed code involving masks), an LLVM error (see below) is produced and the Odin compiler aborts. This only happens when relevant parts of the AVX-512 instruction set are enabled (in this case avx512bw), either via an attribute or via the command-line. When enabling other SIMD instruction sets (e.g. avx2), the code builds without issue.
In the sample code below, this also occurs when swapping main for a test procedure with the same body and attempting to run tests (odin test).
Failure Information (for bugs)
Example error:
LLVM ERROR: Cannot select: 0x740fd819bd00: v16i1 = setcc 0x740fd819b830, 0x740fd819c320, setgt:ch
0x740fd819b830: v16i16 = sub 0x740fd819b6e0, 0x740fd819b7c0
0x740fd819b6e0: v16i16,ch = load<(load (s256) from %ir.0 + 32, basealign 64)> 0x740fd819b590, 0x740fd819bfa0, undef:i64
0x740fd819bfa0: i64 = add 0x740fd819b8a0, Constant:i64<32>
0x740fd819b8a0: i64,ch = CopyFromReg 0x740fd8ae4360, Register:i64 %1
0x740fd819b9f0: i64 = Register %1
0x740fd819c080: i64 = Constant<32>
0x740fd819c0f0: i64 = undef
0x740fd819b7c0: v16i16,ch = load<(load (s256) from constant-pool)> 0x740fd8ae4360, 0x740fd819b520, undef:i64
0x740fd819b520: i64 = X86ISD::Wrapper TargetConstantPool:i64<<16 x i16> <i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384>> 0
0x740fd819bf30: i64 = TargetConstantPool<<16 x i16> <i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384>> 0
0x740fd819c0f0: i64 = undef
0x740fd819c320: v16i16 = bitcast 0x740fd819c160
0x740fd819c160: v8i32 = BUILD_VECTOR Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>
0x740fd819c240: i32 = Constant<-1>
0x740fd819c240: i32 = Constant<-1>
0x740fd819c240: i32 = Constant<-1>
0x740fd819c240: i32 = Constant<-1>
0x740fd819c240: i32 = Constant<-1>
0x740fd819c240: i32 = Constant<-1>
0x740fd819c240: i32 = Constant<-1>
0x740fd819c240: i32 = Constant<-1>
In function: mre.foo
fish: Job 1, '~/Downloads/odin-linux-amd64-de…' terminated by signal SIGABRT (Abort)
Pointer values change with each build.
Steps to Reproduce
- Create an Odin source file
mre.odinwith the following code:
package mre
import "core:simd"
@(enable_target_feature = "avx512bw")
foo :: proc(src: #simd[32]u16, dst: ^[32]u16) {
simd.masked_store(dst, src, simd.lanes_lt(src - auto_cast 16384, auto_cast 32768))
}
main :: proc() {
a : [32]u16
foo({}, &a)
}
- Attempt to build this file (
odin build mre.odin -file).
This error also occurs if the enable_target_feature attribute is removed and the target feature is enabled via the command-line (-target-features:avx512bw). This error seems to be highly dependent on compiler flags; it does not occur if -o:size, -o:speed, or -o:aggressive are given, and also only seems to occur with some microarches (e.g. the default x86-64-v2 and x86-64-v3 fail, x86-64and x86-64-v4 work).
Looks like you also need avx512vl enabled to make codegen happy.
Could be this, which they say may be fixed in LLVM 19: https://github.com/llvm/llvm-project/issues/111380
Looks like you also need avx512vl enabled to make codegen happy.
There are a lot of different ways to make the codegen happy, too. Setting optimization flags sometimes does it, changing the microarch sometimes does it (even if it's one that doesn't support AVX-512)... probably others too. This is extremely sensitive to compiler flags.
Optimization modes makes sense because it probably just removes the entire function because the program doesn't have any side effects, it is very hard to get llvm (with optimizations) to behave in a bug reproduction because it can just remove things.
And even if you get it to not remove your function it could be optimizing it to very different instructions.
The microarch affecting it is a little weird, especially if it doesn't enable the avx512 features.
Per recent findings, it seems like the issue here is that LLVM requires either the evex512 or avx512vl target feature to be able to use AVX-512 instructions. The evex512 feature seems to enable use of the zmm registers with AVX-512 instructions, whereas avx512vl seems to enable use of the xmm/ymm registers with AVX-512 instructions.
Without one or the other, there are no registers that can be used with AVX-512 instructions.
I have to step out for a bit, but I'll try and address it per the discussion on Discord when I get back.