Jörn Horstmann comments

Results 37 comments of


                                            Jörn Horstmann

Consistently use the highest bit of vector masks when converting to i1 vectors

> Does this affect codegen for non-x86 platforms? Seems to have a very similar effect for target `aarch64-unknown-linux-gnu`. Example with simple `select` (using 128 bit registers now): ```rust pub fn...

Consistently use the highest bit of vector masks when converting to i1 vectors

That `is_hex` function from rust-lang/portable-simd#303 is so close to getting vectorized though. ```rust pub fn is_hex_mask(chunk: &[u8; 16]) -> bool { let x = u8x16::from_array(*chunk); let m1 = x.simd_gt(splat(b'0' -...

Consistently use the highest bit of vector masks when converting to i1 vectors

That was an interesting rebase, with the tests having moved in the mean time. Sorry for taking so long coming back to this PR.

Consistently use the highest bit of vector masks when converting to i1 vectors

@Dylan-DPC I updated the PR and adjusted also the new masked load/store intrinsics to use the same logic. I also added another assembly test for masked load that shows there...

Add tests for the generated assembly of mask related simd instructions.

> Do all the pre-EVEX examples need AVX2? I thought vmaskmov was an AVX instruction? `psslw/d/q` on ymm regs is only avx2, with avx only those would use twice the...

Add tests for the generated assembly of mask related simd instructions.

@bors r=@workingjubilee

Ran latest thrift.

Hi @jorgecarleitao and @sunchao, does this PR also need to update the thrift dependency to the latest version (0.15.0) or should I open a separate issue for the update? I...

Improve autovectorization of to_lowercase / to_uppercase functions

r? @the8472 The assembly for x86 and aarch64 can also be seen at https://rust.godbolt.org/z/x6T65nE8E

Improve autovectorization of to_lowercase / to_uppercase functions

@Marcondiro only the microbenchmark included in this PR. On my machine (Intel i9-11900KB) the performance increases by nearly 3x. This is without any target-specific compiler flags, rerunning them now with:...

Improve autovectorization of to_lowercase / to_uppercase functions

Thanks for running the benchmarks, glad that there is no regression on arm. The improvement on x86 mostly comes from the usage of the `pmovmskb` instruction, the equivalent on arm...