rust-hyperloglog icon indicating copy to clipboard operation
rust-hyperloglog copied to clipboard

Fixed a few errors and sped up estimation by a large factor

Open LucaCappelletti94 opened this issue 1 year ago • 0 comments

Hi @jedisct1 - I was benchmarking this implementation of HyperLogLog alongside a few others, and I noticed a few significant errors that I thought best to fix. Here is what I changed:

  1. Fixed indexing error for linear count threshold - before you had forgotten the -4 there, the precision exponent and the threshold were off.

https://github.com/jedisct1/rust-hyperloglog/blob/bf12a8ab19bb26db9f069d2c2463758ed1352aa2/src/lib.rs#L132-L134

  1. I have fixed the unordered estimates, aligned with the biases

https://github.com/jedisct1/rust-hyperloglog/blob/f190a717d9bef2850c9bd69e415c5cce48a8bdaa/src/weights.rs#L4291-L4313

  1. Given now that the estimates are sorted, we can use a partition point search to find the neighbours.

https://github.com/jedisct1/rust-hyperloglog/blob/f190a717d9bef2850c9bd69e415c5cce48a8bdaa/src/lib.rs#L157-L181

Other changes are mostly aesthetic to make the code, in my opinion, more readable.

Do note that this pull request needs for the siphash pull request to be merged

Cheers!

LucaCappelletti94 avatar Aug 10 '24 18:08 LucaCappelletti94