stdarch
stdarch copied to clipboard
AVX-512f intrinsics fail to compile with MemorySanitizer
I found an issue when using the memory sanitizer (which requires rebuilding the standard library with extra flags). The compiler has problems generating code for _mm_cvt_roundss_u32 or _mm512_shuffle_ps.
I don't have a CPU supporting these, but enabling sanitizers does require linking everything due to warts in linkers and LLVM's coverage measurement runtime.
Here is a PR showing how to test this from the rust repository: https://github.com/rust-lang/rust/pull/79382
Alternatively, it may be slightly faster to test it like this (also from a rust checkout):
time RUSTFLAGS_NOT_BOOTSTRAP="-Cpasses=sancov -Clink-dead-code -Zsanitizer=memory -C codegen-units=1" ./x.py build -i library/core
The errors look like this:
LLVM ERROR: Cannot select: 0x7f96e9336bf0: v64i8 = X86ISD::PALIGNR 0x7f96e93296b0, 0x7f96e9381fe8, TargetConstant:i8<8>
0x7f96e93296b0: v64i8,ch = load<(load 64 from %ir.2245)> 0x7f96f547cba8, 0x7f96e9337270, undef:i64
0x7f96e9337270: i64 = xor 0x7f96e93298b8, 0x7f96e9279c58
0x7f96e93298b8: i64,ch = CopyFromReg 0x7f96f547cba8, Register:i64 %1119
0x7f96e93424a0: i64 = Register %1119
0x7f96e9279c58: i64 = AssertZext 0x7f96e9272c00, ValueType:ch:i47
0x7f96e9272c00: i64,ch = CopyFromReg 0x7f96f547cba8, Register:i64 %6
0x7f96e927fad0: i64 = Register %6
0x7f96e931fb58: i64 = undef
0x7f96e9381fe8: v64i8 = bitcast 0x7f96e9344420
0x7f96e9344420: v16i32,ch = CopyFromReg 0x7f96f547cba8, Register:v16i32 %782
Compiling rustc-std-workspace-core v1.99.0 (/home/g2p/src/github.com/rust-lang/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/rustc-std-workspace-core)
0x7f96e927a410: v16i32 = Register %782
0x7f96e93820b8: i8 = TargetConstant<8>
In function: _ZN4core9core_arch3x867avx512f17_mm512_shuffle_ps17h2adad9c5dc64a280E
And
PHI node operands are not the same type as the result!
%_msphi_s = phi i32 [ %42, %38 ], [ %35, %31 ], [ %28, %24 ], [ %21, %17 ], [ %14, %10 ]
in function _ZN4core9core_arch3x867avx512f19_mm_cvt_roundss_u3217h42028e7a281c0c10E
LLVM ERROR: Broken function found, compilation aborted!
Looks like a bug in MemorySanitizer instrumentation pass, I would recommend reporting upstream.
#include <immintrin.h>
unsigned test_mm_cvt_roundss_u32(__m128 __A) {
return _mm_cvt_roundss_u32(__A, _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC);
}
# Note clang has assertions disabled.
$ clang -march=skylake-avx512 a.c -emit-llvm -S -fsanitize=memory
$ opt a.ll
opt: a.ll:36:9: error: stored value and pointer type do not match
store <4 x i32> %11, i32* bitcast ([100 x i64]* @__msan_retval_tls to i32*), align 8
# Note opt has assertions enabled.
$ clang -march=skylake-avx512 a.c -emit-llvm -S
$ opt a.ll -msan
opt: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:2684: void {anonymous}::MemorySanitizerVisitor::handleVectorConvertIntrinsic(llvm::IntrinsicInst&, int): Assertion `CopyOp->getType() == I.getType()' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
I can try to forward your reproducer, but for now I'm still waiting for Bugzilla account approval.
Oh, I forgot about those small roadblocks when reporting bugs to LLVM. I opened https://bugs.llvm.org/show_bug.cgi?id=48298.
Thanks, glad it got fixed already!
I was not able to get a similar minimal reduction for _mm512_shuffle_ps.
Because clang is a beast to build (the linker runs out of memory), I don't have a clang build with assertions enabled. The above commands seem to work with this in a.c
#include <immintrin.h>
__m512 test_mm512_shuffle_ps(__m512 __M, __m512 __V) {
return _mm512_shuffle_ps(__M, __V, 8);
}
I however get _mm512_shuffle_ps code generation errors with this overall test in a rust checkout:
RUSTFLAGS_NOT_BOOTSTRAP="-Cpasses=sancov -Clink-dead-code -Zsanitizer=memory -C codegen-units=1" ./x.py build -i library/core
Which clang version did you build? The one in https://github.com/rust-lang/llvm-project?
I however get _mm512_shuffle_ps code generation errors with this overall test in a rust checkout:
RUSTFLAGS_NOT_BOOTSTRAP="-Cpasses=sancov -Clink-dead-code -Zsanitizer=memory -C codegen-units=1" ./x.py build -i library/core
That command builds the standard library with the bootstrap compiler, which I think currently uses an older version than master. You should try (you already tried that in the rust PR that I only now saw)RUSTFLAGS="-Cpasses=sancov -Clink-dead-code -Zsanitizer=memory -C codegen-units=1" cargo +nightly build -Zbuild-std --target x86_64-unknown-linux-gnu I think. (assuming that you use x86_64-unknown-linux-gnu and have the rust-src component installed)
First I cherry-picked the fix for the cvt functions from upstream llvm (and committed the llvm submodule in the rust repo), ran my "build core with instrumentation" test, which proved that _mm512_shuffle_ps needs an independent fix.
For more in-depth tests, and hopefully a reduction, I tried to get clang built from a rust checkout using
RUSTBUILD_FORCE_CLANG_BASED_TESTS=1 ./x.py build --stage 1
but ld kept getting killed by the OOM killer, and using LLD failed in a different way (ld.lld: error: asan_malloc_linux.cpp:(.debug_loc+0x222F70): has non-ABS relocation R_386_GOTOFF against symbol 'alloc_memory_for_dlsym').
#include <immintrin.h>
__m512 test_mm512_shuffle_ps(__m512 __M, __m512 __V) {
return _mm512_shuffle_ps(__M, __V, 78);
}
$ clang -cc1 -target-feature +avx512f -ffreestanding -triple x86_64-unknown-linux-gnu -x c a.c -internal-isystem /usr/lib64/clang/11.0.0/include -S -emit-obj -fsanitize=memory
clang: llvm/lib/Target/X86/X86ISelLowering.cpp:12493: llvm::SDValue lowerShuffleAsByteRotate(const llvm::SDLoc&, llvm::MVT, llvm::SDValue, llvm::SDValue, llvm::ArrayRef<int>, const llvm::X86Subtarget&, llvm::SelectionDAG&): Assertion `(!VT.is512BitVector() || Subtarget.hasBWI()) && "512-bit PALIGNR requires BWI instructions"' failed.
Reduced:
; ModuleID = 'a.c'
source_filename = "a.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
; Function Attrs: norecurse nounwind readnone
define <16 x i32> @shuffle(<16 x i32> %a, <16 x i32> %b) local_unnamed_addr #0 {
entry:
%c = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 2, i32 3, i32 16, i32 17, i32 6, i32 7, i32 20, i32 21, i32 10, i32 11, i32 24, i32 25, i32 14, i32 15, i32 28, i32 29>
ret <16 x i32> %c
}
attributes #0 = { norecurse nounwind readnone "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="512" "no-builtins" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+avx,+avx2,+avx512f,+cx8,+f16c,+fma,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" "unsafe-fp-math"="false" "use-soft-float"="false" }
!llvm.module.flags = !{!0}
!llvm.ident = !{!1}
!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 12.0.0 (https://github.com/llvm/llvm-project 530c69e90964444bc916d38b337105ab44f0961b)"}
$ llc a.ll
llc: llvm/lib/Target/X86/X86ISelLowering.cpp:12493: llvm::SDValue lowerShuffleAsByteRotate(const llvm::SDLoc&, llvm::MVT, llvm::SDValue, llvm::SDValue, llvm::ArrayRef<int>, const llvm::X86Subtarget&, llvm::SelectionDAG&): Assertion `(!VT.is512BitVector() || Subtarget.hasBWI()) && "512-bit PALIGNR requires BWI instructions"' failed.
Thanks! Forwarded: https://bugs.llvm.org/show_bug.cgi?id=48322