[compiler] use perfect hashes for switches with small strings
Previously, we compiled a string switch as follow:
1. compute a tag (condition) string hash
2. switch over precomputed hashes of cases
3. inside a switch case, do `equals()` for a string
So, we compile a switch as a 2-level comparison: hash matching
plus equals() call on a string.
For shorter strings we can avoid the third step by using a hash that will not result in collisions and identifies a string completely.
Since we're using uint64_t for switch hashes, we have 8 bits for information.
We can store up to 8 chars inside that without any losses.
Our hash function will be more or less memcpy(&h, s, s_len).
Hash of a single char becomes a value in 0 <= x <= 255 range, which
is great for C++ switch density: it's likely to get that switch compiled
as a jump table (useful for lexers, etc).
Another common case is a shared prefix of all case values.
case 'OPERATION_DELETE'
case 'OPERATION_INSERT'
case 'OPERATION_UPDATE'
All cases above have a length of 16 that is too much for our new hashing.
But if we remove the common OPERATION_ prefix, strings fit into 6 bytes.
When we have such switch with common prefixes and varying suffixes that
fit into 8 bytes, we compare the prefix before hashing, if it does match,
variable cases tail is hashed (DELETE, INSERT, UPDATE from above).
Since optimized form doesn't require extra if statement inside every case, the binary size gets smaller as well (but not dramatically).
In average, we compile 60% of string switches using the optimized scheme. (~50% for short strings plus around ~10% for prefix form)
Benchmark results:
name old time/op new time/op delta
SwitchOnlyDefault 70.0ns ± 0% 29.0ns ± 0% -58.57%
Lexer 119ns ± 1% 78ns ± 2% -35.04%
LexerMiss 87.0ns ± 0% 77.0ns ± 0% -11.49%
LexerComplex 115ns ± 1% 93ns ± 2% -19.43%
LexerComplexMiss 87.0ns ± 0% 74.6ns ± 1% -14.25%
Switch8oneChar 158ns ± 0% 95ns ± 1% -39.81%
Switch8oneCharMiss 108ns ± 1% 102ns ± 1% -5.48%
Switch8oneCharNoDefault 164ns ± 1% 87ns ± 1% -46.67%
Switch8oneCharNoDefaultMiss 117ns ± 0% 97ns ± 1% -17.35%
Switch6perfectHash 258ns ± 1% 204ns ± 1% -21.19%
Switch6perfectHashMiss 130ns ± 0% 119ns ± 1% -8.21%
Switch6perfectHashNoDefault 271ns ± 1% 193ns ± 1% -28.72%
Switch6perfectHashNoDefaultMiss 138ns ± 1% 115ns ± 1% -16.58%
Switch12perfectHash 278ns ± 0% 194ns ± 4% -30.19%
Switch12perfectHashMiss 139ns ± 1% 109ns ± 1% -21.62%
Switch12perfectHashNoDefault 275ns ± 1% 190ns ± 0% -30.74%
Switch12perfectHashNoDefaultMiss 142ns ± 1% 108ns ± 0% -24.05%
Switch6perfectHashWithPrefix 286ns ± 0% 219ns ± 3% -23.40%
Switch6perfectHashWithPrefixMiss 190ns ± 1% 143ns ± 2% -24.67%
Switch6perfectHashWithPrefixNoDefault 293ns ± 0% 230ns ± 1% -21.58%
Switch6perfectHashWithPrefixNoDefaultMiss 194ns ± 1% 153ns ± 2% -20.82%
Switch12perfectHashWithPrefix 303ns ± 1% 225ns ± 2% -25.69%
Switch12perfectHashWithPrefixMiss 159ns ± 0% 113ns ± 0% -28.93%
Switch12perfectHashWithPrefixNoDefault 308ns ± 1% 224ns ± 1% -27.18%
Switch12perfectHashWithPrefixNoDefaultMiss 141ns ± 1% 117ns ± 2% -17.14%
[Geo mean] 171ns 130ns -23.93%
Another change is that we now compile a default-only switch a little bit differently. We don't compute a hash and discard auto-generated variables to avoid C++ compiler warnings.
Refs #418 Fixes #503