bitintr icon indicating copy to clipboard operation
bitintr copied to clipboard

Use faster PEXT/PDEP implemetation on older/non-intel CPU

Open terrorfisch opened this issue 3 years ago • 0 comments

The ZP7 https://github.com/zwegner/zp7 implementation by Zach Wegner claims to be faster than the builtin instruction on some AMD architectures for most input masks. According to this twitter the performance on some AMD CPUs is input dependend and much worse than the 1 cycle throughput on intel.

The code is branch free and probably also faster than the naive loop currently used in bitintr. It uses CLMUL if available.

If I find the time I will do a rust implementation and benchmark it.

terrorfisch avatar Jan 31 '21 18:01 terrorfisch