Odin icon indicating copy to clipboard operation
Odin copied to clipboard

Codegen error with AVX-512BW

Open Barinzaya opened this issue 1 year ago • 4 comments

Context

	Odin:    dev-2024-10-nightly
	OS:      Arch Linux, Linux 6.11.2-zen1-1-zen
	CPU:     AMD Ryzen 9 9950X 16-Core Processor
	RAM:     61886 MiB
	Backend: LLVM 18.1.6

Expected Behavior

The code in the snippet below should compile without issues, and should execute without issues if AVX-512BW is available on the machine.

Current Behavior

When building the code in the snippet below (and other similarly-constructed code involving masks), an LLVM error (see below) is produced and the Odin compiler aborts. This only happens when relevant parts of the AVX-512 instruction set are enabled (in this case avx512bw), either via an attribute or via the command-line. When enabling other SIMD instruction sets (e.g. avx2), the code builds without issue.

In the sample code below, this also occurs when swapping main for a test procedure with the same body and attempting to run tests (odin test).

Failure Information (for bugs)

Example error:

LLVM ERROR: Cannot select: 0x740fd819bd00: v16i1 = setcc 0x740fd819b830, 0x740fd819c320, setgt:ch
  0x740fd819b830: v16i16 = sub 0x740fd819b6e0, 0x740fd819b7c0
    0x740fd819b6e0: v16i16,ch = load<(load (s256) from %ir.0 + 32, basealign 64)> 0x740fd819b590, 0x740fd819bfa0, undef:i64
      0x740fd819bfa0: i64 = add 0x740fd819b8a0, Constant:i64<32>
        0x740fd819b8a0: i64,ch = CopyFromReg 0x740fd8ae4360, Register:i64 %1
          0x740fd819b9f0: i64 = Register %1
        0x740fd819c080: i64 = Constant<32>
      0x740fd819c0f0: i64 = undef
    0x740fd819b7c0: v16i16,ch = load<(load (s256) from constant-pool)> 0x740fd8ae4360, 0x740fd819b520, undef:i64
      0x740fd819b520: i64 = X86ISD::Wrapper TargetConstantPool:i64<<16 x i16> <i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384>> 0
        0x740fd819bf30: i64 = TargetConstantPool<<16 x i16> <i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384, i16 16384>> 0
      0x740fd819c0f0: i64 = undef
  0x740fd819c320: v16i16 = bitcast 0x740fd819c160
    0x740fd819c160: v8i32 = BUILD_VECTOR Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>
      0x740fd819c240: i32 = Constant<-1>
      0x740fd819c240: i32 = Constant<-1>
      0x740fd819c240: i32 = Constant<-1>
      0x740fd819c240: i32 = Constant<-1>
      0x740fd819c240: i32 = Constant<-1>
      0x740fd819c240: i32 = Constant<-1>
      0x740fd819c240: i32 = Constant<-1>
      0x740fd819c240: i32 = Constant<-1>
In function: mre.foo
fish: Job 1, '~/Downloads/odin-linux-amd64-de…' terminated by signal SIGABRT (Abort)

Pointer values change with each build.

Steps to Reproduce

  1. Create an Odin source file mre.odin with the following code:
package mre

import "core:simd"

@(enable_target_feature = "avx512bw")
foo :: proc(src: #simd[32]u16, dst: ^[32]u16) {
	simd.masked_store(dst, src, simd.lanes_lt(src - auto_cast 16384, auto_cast 32768))
}

main :: proc() {
	a : [32]u16
	foo({}, &a)
}
  1. Attempt to build this file (odin build mre.odin -file).

This error also occurs if the enable_target_feature attribute is removed and the target feature is enabled via the command-line (-target-features:avx512bw). This error seems to be highly dependent on compiler flags; it does not occur if -o:size, -o:speed, or -o:aggressive are given, and also only seems to occur with some microarches (e.g. the default x86-64-v2 and x86-64-v3 fail, x86-64and x86-64-v4 work).

Barinzaya avatar Oct 14 '24 14:10 Barinzaya

Looks like you also need avx512vl enabled to make codegen happy.

laytan avatar Oct 15 '24 07:10 laytan

Could be this, which they say may be fixed in LLVM 19: https://github.com/llvm/llvm-project/issues/111380

laytan avatar Oct 15 '24 07:10 laytan

Looks like you also need avx512vl enabled to make codegen happy.

There are a lot of different ways to make the codegen happy, too. Setting optimization flags sometimes does it, changing the microarch sometimes does it (even if it's one that doesn't support AVX-512)... probably others too. This is extremely sensitive to compiler flags.

Barinzaya avatar Oct 15 '24 09:10 Barinzaya

Optimization modes makes sense because it probably just removes the entire function because the program doesn't have any side effects, it is very hard to get llvm (with optimizations) to behave in a bug reproduction because it can just remove things.

And even if you get it to not remove your function it could be optimizing it to very different instructions.

The microarch affecting it is a little weird, especially if it doesn't enable the avx512 features.

laytan avatar Oct 15 '24 09:10 laytan

Per recent findings, it seems like the issue here is that LLVM requires either the evex512 or avx512vl target feature to be able to use AVX-512 instructions. The evex512 feature seems to enable use of the zmm registers with AVX-512 instructions, whereas avx512vl seems to enable use of the xmm/ymm registers with AVX-512 instructions.

Without one or the other, there are no registers that can be used with AVX-512 instructions.

Barinzaya avatar Jun 22 '25 15:06 Barinzaya

I have to step out for a bit, but I'll try and address it per the discussion on Discord when I get back.

Kelimion avatar Jun 22 '25 16:06 Kelimion