rust-hyperloglog
rust-hyperloglog copied to clipboard
Fixed a few errors and sped up estimation by a large factor
Hi @jedisct1 - I was benchmarking this implementation of HyperLogLog alongside a few others, and I noticed a few significant errors that I thought best to fix. Here is what I changed:
- Fixed indexing error for linear count threshold - before you had forgotten the
-4there, the precision exponent and the threshold were off.
https://github.com/jedisct1/rust-hyperloglog/blob/bf12a8ab19bb26db9f069d2c2463758ed1352aa2/src/lib.rs#L132-L134
- I have fixed the unordered estimates, aligned with the biases
https://github.com/jedisct1/rust-hyperloglog/blob/f190a717d9bef2850c9bd69e415c5cce48a8bdaa/src/weights.rs#L4291-L4313
- Given now that the estimates are sorted, we can use a partition point search to find the neighbours.
https://github.com/jedisct1/rust-hyperloglog/blob/f190a717d9bef2850c9bd69e415c5cce48a8bdaa/src/lib.rs#L157-L181
Other changes are mostly aesthetic to make the code, in my opinion, more readable.
Do note that this pull request needs for the siphash pull request to be merged
Cheers!