toys icon indicating copy to clipboard operation
toys copied to clipboard

SWAR movmask need not use a multiply high operation

Open Validark opened this issue 7 months ago • 0 comments

Multiply-high instructions are typically more expensive both in terms of latency and throughput than multiply instrucitons, and certain ISA's/compilers (looking at you, MSVC) do not support emitting multiply-high without a function call at all.

In the course of writing a SWAR fallback for my Accelerated-Zig-Parser, I realized that the SWAR movmask operation need not use a multiply high. Instead, we can concentrate the bits in the upper byte, then shift to the lowest byte.

Effectively, the constants in your article can be bitshifted right by the byte size of the input:

old new
0x204081020408100 0x2040810204081
2040810 204081

For 64 bit integers, we shift right by 56 to get the upper 8 bits, and for the 32 bit integers, we shift right by 28 to get the upper 4 bits.

Validark avatar Nov 10 '23 02:11 Validark