packed_simd icon indicating copy to clipboard operation
packed_simd copied to clipboard

LLVM ctpop.v16i8 intrinsic breaks on s390x

Open TheIronBorn opened this issue 7 years ago • 3 comments

This works as a minimal LLVM example: https://godbolt.org/z/_nnrkF

Unfortunately godbolt's nightly Rust is broken so I can't reproduce it in Rust: https://github.com/mattgodbolt/compiler-explorer/issues/1165

u8x16::count_ones/zeros will likely need a workaround until fixed.

Reported to LLVM: https://bugs.llvm.org/show_bug.cgi?id=39730

(ctlz.v16i8 & cttz.v16i8 break as well)

TheIronBorn avatar Nov 20 '18 20:11 TheIronBorn

This is the IR generated by rustc:

define void @_ZN7example5ctpop17h99089f7f1b127411E(<16 x i8>* noalias nocapture sret dereferenceable(16), <16 x i8>* noalias nocapture dereferenceable(16) %x) unnamed_addr #0 {
  %arg = alloca <16 x i8>, align 16
  %1 = load <16 x i8>, <16 x i8>* %x, align 16
  store <16 x i8> %1, <16 x i8>* %arg, align 16
  call void @llvm.ctpop.v16i8(<16 x i8>* noalias nocapture sret dereferenceable(16) %0, <16 x i8>* noalias nocapture dereferenceable(16) %arg)
  br label %bb1

bb1: ; preds = %start
  ret void
}

Which is clearly wrong. Using the unadjusted ABI generates the correct code.

I'm not sure where exactly that places the bug -- possibly rustc should always assume unadjusted when linking LLVM intrinsics, as nothing else really makes sense in that context, and it leaves less room for error in target ABI handling?

nikic avatar Feb 12 '19 22:02 nikic

possibly rustc should always assume unadjusted when linking LLVM intrinsics, as nothing else really makes sense in that context

I think it makes sense to fix this upstream - cc @alexcrichton @rkruppe

gnzlbg avatar Feb 12 '19 22:02 gnzlbg

Sounds like a good idea to me!

alexcrichton avatar Feb 13 '19 14:02 alexcrichton