toys icon indicating copy to clipboard operation
toys copied to clipboard

Help needed for AVX2/512 popcount

Open jianshu93 opened this issue 5 months ago • 1 comments

Dear @WojciechMula,

In some code of our software, we use GNU popcount like this:

// Start of macros and method copied from https://github.com/kimwalisch/libpopcnt

#ifdef GNUC #define GNUC_PREREQ(x, y)
(GNUC > x || (GNUC == x && GNUC_MINOR >= y)) #else #define GNUC_PREREQ(x, y) 0 #endif

#ifndef __has_builtin #define __has_builtin(x) 0 #endif

/*

  • This uses fewer arithmetic operations than any other known

  • implementation on machines with fast multiplication.

  • It uses 12 arithmetic operations, one of which is a multiply.

  • http://en.wikipedia.org/wiki/Hamming_weight#Efficient_implementation */ static inline uint64_t popcount64(uint64_t x) { uint64_t m1 = 0x5555555555555555ll; uint64_t m2 = 0x3333333333333333ll; uint64_t m4 = 0x0F0F0F0F0F0F0F0Fll; uint64_t h01 = 0x0101010101010101ll;

    x -= (x >> 1) & m1; x = (x & m2) + ((x >> 2) & m2); x = (x + (x >> 4)) & m4;

    return (x * h01) >> 56; }

// End of macros and method copied from https://github.com/kimwalisch/libpopcnt

... prepare bits code here, which is uint64_t ...

#if GNUC_PREREQ(4, 2) || __has_builtin(__builtin_popcountll) samebits += __builtin_popcountll(bits); #else samebits += popcount64(bits) #endif

We want to use instruction specific popcount, e.g., SSE,AVX2 or AVX512. Do we just use sse_count_byte_popcount and avx2_count_byte_popcount(), but for AVX512 there is not such a _popcount() function to use. Not an expert in SIMD at all but just so surprised at how SIMD can accelerate popcount.

Many thanks,

Jianshu

jianshu93 avatar Jan 15 '24 06:01 jianshu93