cpp_weekly icon indicating copy to clipboard operation
cpp_weekly copied to clipboard

A much faster and (mostly) branchless lower_bound() implementation

Open usefulcat opened this issue 2 years ago • 0 comments

Channel

C++Weekly

Topics

lower_bound, branchless programming, performance, cmov

Length

10-20

I've profiled this myself and found that it is consistently faster than std::lower_bound (from libstdc++) using either clang or gcc, although gcc performs better. As expected there is less of a difference for large collections due to memory accesses, but for smaller collections it can indeed be twice as fast as std::lower_bound.

The algorithm this is based on ("Shar's algorithm") is from a book published in 1982.

Code and description here: https://probablydance.com/2023/04/27/beautiful-branchless-binary-search/

Interestingly, replacing the body of the for loop with the following improves the performance for clang but is a pessimization for gcc:

const size_t increment[] = { 0, step };
begin += increment[compare(begin[step], value)];

usefulcat avatar Apr 28 '23 15:04 usefulcat