cuckoofilter
cuckoofilter copied to clipboard
Filter is unreliable?
I imported this package in the project, thank you! But I found the filter unreliable. When I load about 500,000 data from the database and use the method InsertUnique to filter, I found that about 4000 data returned true.It means that the data is already repeated? But,the database table has already made a unique primary key.And I confirm that the data is not duplicated in the database.
Will check it out in a bit
On 8. Jun 2017, 05:00 +0200, W-Jie [email protected], wrote:
I imported this package in the project, thank you! But I found the filter unreliable. When I load about 500,000 data from the database and use the method InsertUnique to filter, I found that about 4000 data returned true.It means that the data is already repeated? But,the database table has already made a unique primary key.And I confirm that the data is not duplicated in the database. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
2 Questions
- how big is your filter
- generally 4k out of 500k means that less than 0.8% false positive rate. Might be expected with 1 byte fingerprints, I am adding a compact 12bit fingerprint version that should reduce the false positive rate. But it depends on the size of the cuckoo filter (So back to question 1)