llvm-project icon indicating copy to clipboard operation
llvm-project copied to clipboard

riscv 64-bit popcount uses inefficient constant materialization

Open efriedma-quic opened this issue 11 months ago • 4 comments

Consider:

int a(unsigned long long x) { return __builtin_popcountll(x); }

Targeting rv64, this generates:

a:
        srli    a1, a0, 1
        lui     a2, 349525
        addiw   a2, a2, 1365
        slli    a3, a2, 32
        add     a2, a2, a3
        and     a1, a1, a2
        sub     a0, a0, a1
        lui     a1, 209715
        addiw   a1, a1, 819
        slli    a2, a1, 32
        add     a1, a1, a2
        and     a2, a0, a1
        srli    a0, a0, 2
        and     a0, a0, a1
        add     a0, a0, a2
        srli    a1, a0, 4
        add     a0, a0, a1
        lui     a1, 61681
        addiw   a1, a1, -241
        slli    a2, a1, 32
        add     a1, a1, a2
        and     a0, a0, a1
        lui     a1, 4112
        addiw   a1, a1, 257
        slli    a2, a1, 32
        add     a1, a1, a2
        mul     a0, a0, a1
        srli    a0, a0, 56
        ret

There are 4 constant integers involved in this computation: 0x5555555555555555, 0x3333333333333333, 0x0F0F0F0F0F0F0F0F, and 0x0101010101010101. The way we're materializing the constants is not efficient. In isolation, each of these takes 4 instructions to materialize, which I think is optimal... but the constants are related to each other. 0x3333333333333333 == (0x0F0F0F0F0F0F0F0F ^ (0x0F0F0F0F0F0F0F0F << 2)). 0x5555555555555555 == (0x3333333333333333 ^ (0x3333333333333333 << 1)). 0x0101010101010101 == (0x0F0F0F0F0F0F0F0F & (0x0F0F0F0F0F0F0F0F >> 3)).

efriedma-quic avatar Mar 21 '24 22:03 efriedma-quic