elfkit icon indicating copy to clipboard operation
elfkit copied to clipboard

Bloom filter crate is buggy and not maintained

Open mfil opened this issue 6 years ago • 3 comments

The bloom crate has a bug which was reported in 2017. There were no actions in the repo for three years.

Briefly, the bug is as follows: one needs to compute k hash values for an element added to the filter. In order to cut down the number of hash function calls, the crate computes two independent hashes and uses them as a seed for a RNG and then samples k values from it. I think that the basic idea is sound, but they botched the implementation of the RNG. See the next method for HashIter in src/hashing.rs. All RNG outputs after the first are a multiple of h2 modulo 2^64.

I'm sorry to say that I don't know a good alternative for the crate. I've been looking at several other bloom filter implementations in Rust for work and none of them inspired much confidence, so we ultimately decided to roll our own.

mfil avatar Jul 24 '19 22:07 mfil

thanks for the hint. luckily elfkit only uses bloom filters in one place, that might as well be a hashset.

Could you point me at your implementation? Happy to check it out

aep avatar Jul 25 '19 06:07 aep

We're just getting started on our implementation and I don't know if our employer will let us open-source it. Sorry!

mfil avatar Jul 25 '19 07:07 mfil

If you don't need bloom filters specifically, you could also check out cuckoofilters: https://crates.io/crates/cuckoofilter

They're supposed to have better performance.

mfil avatar Jul 29 '19 11:07 mfil