cuckoofilter icon indicating copy to clipboard operation
cuckoofilter copied to clipboard

Filter is unreliable?

Open W-Jie opened this issue 7 years ago • 2 comments

I imported this package in the project, thank you! But I found the filter unreliable. When I load about 500,000 data from the database and use the method InsertUnique to filter, I found that about 4000 data returned true.It means that the data is already repeated? But,the database table has already made a unique primary key.And I confirm that the data is not duplicated in the database.

W-Jie avatar Jun 08 '17 03:06 W-Jie

Will check it out in a bit

On 8. Jun 2017, 05:00 +0200, W-Jie [email protected], wrote:

I imported this package in the project, thank you! But I found the filter unreliable. When I load about 500,000 data from the database and use the method InsertUnique to filter, I found that about 4000 data returned true.It means that the data is already repeated? But,the database table has already made a unique primary key.And I confirm that the data is not duplicated in the database. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

seiflotfy avatar Jun 08 '17 08:06 seiflotfy

2 Questions

  1. how big is your filter
  2. generally 4k out of 500k means that less than 0.8% false positive rate. Might be expected with 1 byte fingerprints, I am adding a compact 12bit fingerprint version that should reduce the false positive rate. But it depends on the size of the cuckoo filter (So back to question 1)

seiflotfy avatar Aug 01 '17 17:08 seiflotfy