CPU pipline parallel reverse bits algorithm
algorithm is explained in the book Hackers Delight You can see it also here Bit Twiddling Hacks at graphics.stanford.edu
Same as the other perf fix. Can you provide a scenario to repro the perf improvement?
We of course love perf improvements!
I have no perf data on this. But this is old stuff, using bitwise ops and utilizing pipelining instead of a loop.
Can you update the comment at the top. If there’s a resource you used to get that algorithm ideally put that link there. Otherwise just delete it.
Once that’s in I should be ready to merge.
By the way if you want to work together on more changes let’s collaborate on https://community.keydb.dev/
@JohnSully great, meanwhile am getting acquainted with the code, and these are just low hanging fruits that quickly identify base on my previous experience.
This one is still delayed on the comment fix.
Hi @JohnSully what comment you need me to fix? I have edited the comment in the commit on top and aslo the comment above the function in dict.c has the url. another source is https://www.amazon.com/Hackers-Delight-2nd-Henry-Warren/dp/0321842685 chapter 7. https://www.oreilly.com/library/view/hackers-delight/0201914654/0201914654_ch07lev1sec1.html
Hi @tryfinally The comment highlighted in blue. Since you have changed it perhaps you have not pushed to the source branch?
I'm preparing a new release which includes your earlier changes however this one will have to wait to the next release.

done
done
x = (x & 0x55555555) << 1 | (x & 0xAAAAAAAA) >> 1;
Is this line removed by accident?
@miroslavp yes it is by mistake. thanks!
Clang offers a __builtin_bitreverse intrinsic, which on architectures like ARM directly calls the native rbit instruction; might be worth using it when compiling with clang.