toys
toys copied to clipboard
Help needed for AVX2/512 popcount
Dear @WojciechMula,
In some code of our software, we use GNU popcount like this:
// Start of macros and method copied from https://github.com/kimwalisch/libpopcnt
#ifdef GNUC
#define GNUC_PREREQ(x, y)
(GNUC > x || (GNUC == x && GNUC_MINOR >= y))
#else
#define GNUC_PREREQ(x, y) 0
#endif
#ifndef __has_builtin #define __has_builtin(x) 0 #endif
/*
-
This uses fewer arithmetic operations than any other known
-
implementation on machines with fast multiplication.
-
It uses 12 arithmetic operations, one of which is a multiply.
-
http://en.wikipedia.org/wiki/Hamming_weight#Efficient_implementation */ static inline uint64_t popcount64(uint64_t x) { uint64_t m1 = 0x5555555555555555ll; uint64_t m2 = 0x3333333333333333ll; uint64_t m4 = 0x0F0F0F0F0F0F0F0Fll; uint64_t h01 = 0x0101010101010101ll;
x -= (x >> 1) & m1; x = (x & m2) + ((x >> 2) & m2); x = (x + (x >> 4)) & m4;
return (x * h01) >> 56; }
// End of macros and method copied from https://github.com/kimwalisch/libpopcnt
... prepare bits code here, which is uint64_t ...
#if GNUC_PREREQ(4, 2) || __has_builtin(__builtin_popcountll) samebits += __builtin_popcountll(bits); #else samebits += popcount64(bits) #endif
We want to use instruction specific popcount, e.g., SSE,AVX2 or AVX512. Do we just use sse_count_byte_popcount and avx2_count_byte_popcount(), but for AVX512 there is not such a _popcount() function to use. Not an expert in SIMD at all but just so surprised at how SIMD can accelerate popcount.
Many thanks,
Jianshu