Jan Wassenberg
Jan Wassenberg
Some other suggestions for potentially useful patterns, described as lane indices of the resulting 32-bit lanes: 2301 (swapping 32-bit pairs), 1032 (swapping 64-bit pairs), 0321/2103 (rotate right/left), 0123 (reverse). All...
+1 to these being useful. @nemequ we can consider them 'relaxed' if we extend the definition to "each 128-bit block within a vector", right? SVE2 does that (unfortunately it's an...
@penzn OK :) I'd also welcome both v128 and flexible AES.
> I don't think it is possible to safely and efficiently emulate these instructions on scalar-only VMs I recently implemented a [constant-time fallback using basic SIMD only](https://github.com/google/highway/commit/7d080f1dcdb798dd8661951aca3aa9b0a5dd352c).
Number of FMA also sounds very useful but the original report seems to focus on the AVX-512 turbo clock. Even non-complex 512-bit instructions (e.g. XOR) apparently cause heavy throttling on...
Hi, just happened across this issue while searching. Have you seen https://github.com/google/highway ? It's a C++ wrapper over intrinsics that supports SVE, RISC-V, AVX-512 and others. Would be happy to...
Sorry about the delayed reply, I was out on Thu/Fri. Ah, that's a clever implementation, thanks for sharing :) We'd be happy to add those saturated adds if you or...
@JonLiu1993 Oh, great to hear Highway is in vcpkg, thanks for reaching out :) We'd be happy to add this to the readme, are you able to sign the CLA?
Hi @laoshaw , thanks for reaching out. This is a very interesting topic. We do have such a benchmark for vectorized quicksort, see hwy/contrib/sort/bench_*. We've tested this on x86, NEON...
Nice, thank you @johnplatts for sharing the idea. Will be happy to add these soon. Would you like to have a quick chat via video call to exchange notes?