frozen
frozen copied to clipboard
Binary search on small maps
In the case of smallish maps, linear search (below some threshold) will outperform binary search, f.e. converting a hex char to its decimal counterpart:
frozen::map<char, char, 22> map_lookup {
{ '0', 0 }, { '1', 1 }, { '2', 2 }, { '3', 3 }, { '4', 4 }, { '5', 5 }, { '6', 6 }, { '7', 7 }, { '8', 8 },
{ '9', 9 }, { 'a', 10 }, { 'b', 11 }, { 'c', 12 }, { 'd', 13 }, { 'e', 14 }, { 'f', 15 }, { 'A', 10 }, { 'B', 11 },
{ 'C', 12 }, { 'D', 13 }, { 'E', 14 }, { 'F', 15 }
};
int table_lookup ( char s_ ) noexcept {
struct KeyValue { const char key, value; };
static const std::array<KeyValue, 22> table { {
{ '0', 0 }, { '1', 1 }, { '2', 2 }, { '3', 3 }, { '4', 4 }, { '5', 5 }, { '6', 6 }, { '7', 7 }, { '8', 8 },
{ '9', 9 }, { 'a', 10 }, { 'b', 11 }, { 'c', 12 }, { 'd', 13 }, { 'e', 14 }, { 'f', 15 }, { 'A', 10 }, { 'B', 11 },
{ 'C', 12 }, { 'D', 13 }, { 'E', 14 }, { 'F', 15 }
} };
for ( auto [ k, v ] : table )
if ( k == s_ )
return v;
return -1;
}
The second approach is about 40% faster than the first one.
@degski Can you give a full example program that shows that 40%? I tried but did not find such big differences. :-(
@RokerHRO https://gist.github.com/degski/a10a66d12971fa5709494af1ec131a32 (it's actually 50%, table_lookup00() is faster (a lot), but is no longer comparable to the frozen_map lookup).
@degski : OMG, that is far longer than a "minimal testcase", I didn't even understand that source completely. I'll write my own test case / benchmark program… :-(
@RokerHRO LOL, You didn't ask for a "minimal testcase", you asked for a "full example program". It is, it tests the various implementations of the conversion of a hex character string to the numeric representation. It's described in the referenced blog-post. It should compile without issue on nix with gcc or clang, on win you'll probably need clang.
Now you've asked for a minimal test case, I'll put one together and alert you to it.
@RokerHRO I've stripped all the garbage out, here it is: https://gist.github.com/degski/accd357a6f5c93c70d4e623a40b160c4 . What these functions do is described here https://lemire.me/blog/2019/04/17/parsing-short-hexadecimal-strings-efficiently/
I realized as well that my expression of how much faster [it works in my mind, but probably not for others] is weird, when I say 40% faster, I meant to say the the said program runs in 60% of the time of the reference program. i.e. 50% means twice as fast.
Running your example under Windows using MSVC v14.27 I get the following results:
table_lookup:
sum = 1638054990250
time = 1894
frozen_map:
sum = 1638054990250
time = 664
For reference I compiled with FROZEN_NO_EXCEPTIONS
and changed plf::nanotimer
to std::chrono::high_resolution_clock
-machinery, as well as __attribute__ ((noinline))
to __declspec(noinline)
What are the statistics for the values you quote? Is the standard deviation or measure of error small? Are we sure that the compiler is not optimizing out the test (which can be both good & bad)?