toys SWAR movmask need not use a multiply high operation

SWAR movmask need not use a multiply high operation

Open Validark opened this issue 7 months ago • 0 comments

Multiply-high instructions are typically more expensive both in terms of latency and throughput than multiply instrucitons, and certain ISA's/compilers (looking at you, MSVC) do not support emitting multiply-high without a function call at all.

In the course of writing a SWAR fallback for my Accelerated-Zig-Parser, I realized that the SWAR movmask operation need not use a multiply high. Instead, we can concentrate the bits in the upper byte, then shift to the lowest byte.

Effectively, the constants in your article can be bitshifted right by the byte size of the input:

old	new
0x204081020408100	0x2040810204081
2040810	204081

For 64 bit integers, we shift right by 56 to get the upper 8 bits, and for the 32 bit integers, we shift right by 28 to get the upper 4 bits.

Nov 10 '23 02:11 Validark

toys toys copied to clipboard

SWAR movmask need not use a multiply high operation

toys
toys copied to clipboard