minisketch
minisketch copied to clipboard
Test performance with -mlzcnt
At the moment the __builtin_clz* compile down to bsrq on x86_64. Compiling with -mlzcnt wires up the actual instruction.
CountBits<unsigned long long> without -mlzcnt:
_Z9CountBitsmi:
.LFB189:
.cfi_startproc
endbr64
xorl %eax, %eax
testq %rdi, %rdi
je .L1
bsrq %rdi, %rdi
movl $64, %eax
xorq $63, %rdi
subl %edi, %eax
CountBits<unsigned long long> with -mlzcnt:
_Z9CountBitsmi:
.LFB189:
.cfi_startproc
endbr64
xorl %eax, %eax
testq %rdi, %rdi
je .L1
movl $64, %eax
lzcntq %rdi, %rdi
subl %edi, %eax
I'm unable to test the significance of that because my CPU does not support the instruction. But I assume @sipa would probably know right away whether it's worth bothering.
Related to #80.