java-LSH icon indicating copy to clipboard operation
java-LSH copied to clipboard

Why is that to get relevant results, the number of elements per bucket should be at least 100?

Open snie2012 opened this issue 8 years ago • 0 comments

In the comments of the example LSHMinHash code, it says that 'to get relevant results, the number of elements per bucket should be at least 100'. Why?

I tried to specify a number of buckets where the average number of elements per buckets is lower than 100, it turned out that many buckets were empty. Does this have to do with the hashing function that calculates the bucket for each band of the signature? Or is it because that a large portion of signatures after banding are more likely to be identical so they are hashed to the same buckets?

Thanks in advance!

snie2012 avatar Aug 22 '17 01:08 snie2012