volk icon indicating copy to clipboard operation
volk copied to clipboard

popcount with clang

Open michael-roe opened this issue 2 years ago • 2 comments

Modern compilers are clever; they can recognize some of the algorithms for popcount and replace them with a popcnt (Intel) or cpop (RISC-V) instruction.

int count(long x) { int v = 0; while(x != 0) { x &= x - 1; v++; } return v; }

This will get turned into popcntq %rdi, %rax by clang (with -O3 -march=x86-64-v2)

This would suggest that in order to get good performance, VOLK's popcount implementation ought to be one of the ones that is recognized by popular compilers.

michael-roe avatar May 15 '22 15:05 michael-roe

Thanks for the hint. Do you have a reference to read? And would you be willing to create a PR with this change?

jdemel avatar May 16 '22 07:05 jdemel

I'm thinking about creating a pull request for this, but I'm trying to get cpu_features working on non-x86 architectures first, so I can test on a CPU that has popcount but isn't x86.

michael-roe avatar May 22 '22 07:05 michael-roe