Jörn Horstmann
Jörn Horstmann
> Does this affect codegen for non-x86 platforms? Seems to have a very similar effect for target `aarch64-unknown-linux-gnu`. Example with simple `select` (using 128 bit registers now): ```rust pub fn...
That `is_hex` function from rust-lang/portable-simd#303 is so close to getting vectorized though. ```rust pub fn is_hex_mask(chunk: &[u8; 16]) -> bool { let x = u8x16::from_array(*chunk); let m1 = x.simd_gt(splat(b'0' -...
That was an interesting rebase, with the tests having moved in the mean time. Sorry for taking so long coming back to this PR.
@Dylan-DPC I updated the PR and adjusted also the new masked load/store intrinsics to use the same logic. I also added another assembly test for masked load that shows there...
> Do all the pre-EVEX examples need AVX2? I thought vmaskmov was an AVX instruction? `psslw/d/q` on ymm regs is only avx2, with avx only those would use twice the...
@bors r=@workingjubilee
Hi @jorgecarleitao and @sunchao, does this PR also need to update the thrift dependency to the latest version (0.15.0) or should I open a separate issue for the update? I...
r? @the8472 The assembly for x86 and aarch64 can also be seen at https://rust.godbolt.org/z/x6T65nE8E
@Marcondiro only the microbenchmark included in this PR. On my machine (Intel i9-11900KB) the performance increases by nearly 3x. This is without any target-specific compiler flags, rerunning them now with:...
Thanks for running the benchmarks, glad that there is no regression on arm. The improvement on x86 mostly comes from the usage of the `pmovmskb` instruction, the equivalent on arm...