quamina
quamina copied to clipboard
numbits+base128 8-byte full-precision numbers
For background, see #336
The effect is ability to match full range of float64 and more compact representation, with significant speedup.
BTW there's something wrong with the benchmarks - my local benchmarks which measure matches/second rather than ns/op have slightly improved in this merge. Need to investigate.
:warning: Please install the to ensure uploads and comments are reliably processed by Codecov.
Codecov Report
Attention: Patch coverage is 97.29730% with 2 lines in your changes missing coverage. Please review.
Project coverage is 96.47%. Comparing base (
db62f53) to head (b3b6934). Report is 2 commits behind head on main.
| Files | Patch % | Lines |
|---|---|---|
| value_matcher.go | 86.66% | 2 Missing :warning: |
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@ Coverage Diff @@
## main #349 +/- ##
==========================================
- Coverage 96.59% 96.47% -0.13%
==========================================
Files 19 20 +1
Lines 1940 1899 -41
==========================================
- Hits 1874 1832 -42
- Misses 37 39 +2
+ Partials 29 28 -1
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Thanks again @arnehormann, the variable-length numbits work perfectly. The numbits type is still long of course, but its toUTF8() method does the shortening. I've verified that for the “common” numbers, it produces much shorter slices.
We currently don't have a good benchmark to demonstrate a performance improvement, because all of our number benchmarks are exploring corner cases with weird huge/tiny/high-precision numbers, not just ordinary numbers like the kind that would actually be used. For the output of rand.Float64(), the average length is ~9.7 but those are definitely not everyday numbers.
But I'm confident that in practice this will give higher performance.
Wow, I really need to update my local copy of the linter. Surprisingly, I think that lint failure is an actual bug; I will try to prove that with a unit test.
... and just fyi (expectation management): this variable length scheme will only compress positive integers and power-of-two fractions. It won't do much for negative values because they had to be negated (xored with one bits) in the process to make them lexically sortable. Still worth it, it doesn't cost much time and no additional bytes at all.
Haha, you're unduly pessimistic. numbits are integer-friendly. For the ints 0 to 100,000 inclusive, here are the counts of shortened-numbits lengths:
1: 1
2: 1
3: 115
4: 7590
5: 92294
None higher than 5. All <5 until you get over 1K. But you are correct, negative numbers all seem to be length 10
This is actually very good because in real life, many (most?) of the numbers that get matched are integers, not too large.
as I said: only positive integers and power of two fractions. but positive values can be as little as 2 bytes 😁 edit... right, it can even be 1 byte 😃
BTW on that last commit I neglected to add the most recent version of numbits, which has an optimized shortening that causes a tiny performance improvement. A little short of sleep these days…