volk
volk copied to clipboard
popcount with clang
Modern compilers are clever; they can recognize some of the algorithms for popcount and replace them with a popcnt (Intel) or cpop (RISC-V) instruction.
int count(long x) { int v = 0; while(x != 0) { x &= x - 1; v++; } return v; }
This will get turned into popcntq %rdi, %rax by clang (with -O3 -march=x86-64-v2)
This would suggest that in order to get good performance, VOLK's popcount implementation ought to be one of the ones that is recognized by popular compilers.
Thanks for the hint. Do you have a reference to read? And would you be willing to create a PR with this change?
I'm thinking about creating a pull request for this, but I'm trying to get cpu_features working on non-x86 architectures first, so I can test on a CPU that has popcount but isn't x86.