java-LSH
java-LSH copied to clipboard
Why is that to get relevant results, the number of elements per bucket should be at least 100?
In the comments of the example LSHMinHash code, it says that 'to get relevant results, the number of elements per bucket should be at least 100'. Why?
I tried to specify a number of buckets where the average number of elements per buckets is lower than 100, it turned out that many buckets were empty. Does this have to do with the hashing function that calculates the bucket for each band of the signature? Or is it because that a large portion of signatures after banding are more likely to be identical so they are hashed to the same buckets?
Thanks in advance!