LLVM ctpop.v16i8 intrinsic breaks on s390x
This works as a minimal LLVM example: https://godbolt.org/z/_nnrkF
Unfortunately godbolt's nightly Rust is broken so I can't reproduce it in Rust: https://github.com/mattgodbolt/compiler-explorer/issues/1165
u8x16::count_ones/zeros will likely need a workaround until fixed.
Reported to LLVM: https://bugs.llvm.org/show_bug.cgi?id=39730
(ctlz.v16i8 & cttz.v16i8 break as well)
This is the IR generated by rustc:
define void @_ZN7example5ctpop17h99089f7f1b127411E(<16 x i8>* noalias nocapture sret dereferenceable(16), <16 x i8>* noalias nocapture dereferenceable(16) %x) unnamed_addr #0 {
%arg = alloca <16 x i8>, align 16
%1 = load <16 x i8>, <16 x i8>* %x, align 16
store <16 x i8> %1, <16 x i8>* %arg, align 16
call void @llvm.ctpop.v16i8(<16 x i8>* noalias nocapture sret dereferenceable(16) %0, <16 x i8>* noalias nocapture dereferenceable(16) %arg)
br label %bb1
bb1: ; preds = %start
ret void
}
Which is clearly wrong. Using the unadjusted ABI generates the correct code.
I'm not sure where exactly that places the bug -- possibly rustc should always assume unadjusted when linking LLVM intrinsics, as nothing else really makes sense in that context, and it leaves less room for error in target ABI handling?
possibly rustc should always assume
unadjustedwhen linking LLVM intrinsics, as nothing else really makes sense in that context
I think it makes sense to fix this upstream - cc @alexcrichton @rkruppe
Sounds like a good idea to me!