llvm-project icon indicating copy to clipboard operation
llvm-project copied to clipboard

riscv popcount lowering shouldn't call __muldi3/__mulsi3

Open efriedma-quic opened this issue 1 year ago • 6 comments

Consider:

int a(unsigned x) { return __builtin_popcount(x); }

With -march=rv32i, this generates a call to __mulsi3. This is likely to be slow; if we don't have "m", we should use shifts instead.

efriedma-quic avatar Mar 21 '24 21:03 efriedma-quic

@llvm/issue-subscribers-backend-risc-v

Author: Eli Friedman (efriedma-quic)

Consider:
int a(unsigned x) { return __builtin_popcount(x); }

With -march=rv32i, this generates a call to __mulsi3. This is likely to be slow; if we don't have "m", we should use shifts instead.

llvmbot avatar Mar 21 '24 22:03 llvmbot

We are using multiply here because this is the default expanding of ISD::CTPOP. https://github.com/llvm/llvm-project/blob/72c729f354d71697a1402720c90b57ff521b6739/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp#L8653-L8718 We may need custom lowering of ISD::CTPOP when there is no m and Zbb.

wangpc-pp avatar Mar 22 '24 16:03 wangpc-pp

We are using multiply here because this is the default expanding of ISD::CTPOP.

https://github.com/llvm/llvm-project/blob/72c729f354d71697a1402720c90b57ff521b6739/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp#L8653-L8718

We may need custom lowering of ISD::CTPOP when there is no m and Zbb.

Can we check isOperationLegalOrCustom in the default handler?

topperc avatar Mar 22 '24 17:03 topperc

We are using multiply here because this is the default expanding of ISD::CTPOP. https://github.com/llvm/llvm-project/blob/72c729f354d71697a1402720c90b57ff521b6739/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp#L8653-L8718

We may need custom lowering of ISD::CTPOP when there is no m and Zbb.

Can we check isOperationLegalOrCustom in the default handler?

Yeah, I think so! But I don't get how to use shifts here...

wangpc-pp avatar Mar 22 '24 17:03 wangpc-pp

x *= 0x01010101; is just x += x << 8; x += x << 16;. Maybe you can save a shift somewhere.

efriedma-quic avatar Mar 22 '24 19:03 efriedma-quic

I think this should be generic optimization opportunity for converting multiplication with specific constants into shift & add. It would not be limited to the popcount use case.

In #79823, I mentioned that (x * 0x08040201) can be converted to { tmp = x + (x << 9); tmp += (tmp << 18); } and that GCC does perform such optimization.

Explorer09 avatar Mar 24 '24 14:03 Explorer09