sketchy icon indicating copy to clipboard operation
sketchy copied to clipboard

HLL: add 1 to the number of zeros?

Open jwr opened this issue 8 years ago • 0 comments

Ok, I realize this might be wasting your time, but I can't let go and I have to ask:

In hll/insert*, line 24 https://github.com/bigmlcom/sketchy/blob/fa64f2b739dd2c0d7346a4fe1a8985a2a140c8bc/src/clj/bigml/sketchy/hyper_loglog.clj#L24 you do not add 1 to the number of trailing zeros. Reading the original HLL paper, as well as the subsequent "Understanding the HyperLogLog", it seems they always add 1 (to quote from the original paper: "equivalently one plus the length of the initial run of 0’s").

This makes a big difference, since the buckets are initialized with zeros, so not adding 1 means that in many cases we won't even update the buckets.

I have to ask, because yes, of course I added inc to that line, and quickly discovered that when I do, the tests do not pass and the results are incorrect. So the code works correctly, but why?

jwr avatar Feb 23 '18 13:02 jwr