cpp_weekly
cpp_weekly copied to clipboard
A much faster and (mostly) branchless lower_bound() implementation
Channel
C++Weekly
Topics
lower_bound, branchless programming, performance, cmov
Length
10-20
I've profiled this myself and found that it is consistently faster than std::lower_bound (from libstdc++) using either clang or gcc, although gcc performs better. As expected there is less of a difference for large collections due to memory accesses, but for smaller collections it can indeed be twice as fast as std::lower_bound.
The algorithm this is based on ("Shar's algorithm") is from a book published in 1982.
Code and description here: https://probablydance.com/2023/04/27/beautiful-branchless-binary-search/
Interestingly, replacing the body of the for loop with the following improves the performance for clang but is a pessimization for gcc:
const size_t increment[] = { 0, step };
begin += increment[compare(begin[step], value)];