packed_simd
packed_simd copied to clipboard
Optimize u8x8::trailing_zeros for AArch64
LLVM's cttz.v8i8 intrinsic is broken on AArch64 machines: https://github.com/rust-lang-nursery/packed_simd/issues/191
Our current workaround just applies u8::trailing_zeros to each lane. With 8 lanes, that can be quite slow.
It could be optimized by adapting LLVM's algorithm to Rust's AArch64 SIMD intrinsics (some may be missing and we would have to implement those as well: https://github.com/rust-lang-nursery/stdsimd/issues/40).