bloomfilter.js
bloomfilter.js copied to clipboard
Use optimal m/k given n/p option
I was just wondering why you don't have an option to use the optimal m and k parameters (see here) based on n and p.
There are a couple ways I imagine this going:
- Add an optional 3rd argument in the constructor that when present and set to
truewould interpret the first and second values of the constructor as n and p instead of m and k. - Change the constructor to accept a hash of either {m: blah, k: blah} or {n: blah, k: blah} (hopefully with better names than the single letters.
- Add a new function, something like BloomFilter.useOptimal(n, p).
I could implement any of these, but I wanted to ask you first if there was any reason why you didn't do this.
Great idea! I originally didn't include n and p as I was inspired by https://github.com/willf/bloom, which includes an empirical false positive estimator rather than relying on theory. I'm not entirely sure why he does this, but it's probably because FNK isn't as good as a theoretically perfect hash function (although it is very fast and hence "good enough" for bloom filters).
I plan on making the hash function configurable anyway, so I'll add support for this at the same time.
The other idea I had was the ability to set a fixed false positive rate, and have the bit array automatically grow (perhaps by doubling each time) to keep the rate below a fixed value.
Ah, I see. Excellent!
Apparently this guy implemented the latter idea you mentioned. I haven't looked at his implementation, but maybe you can take inspiration/code from there :)
I implemented such an approach on a fork of mine : https://github.com/Phyks/bloomfilter.js
This uses capacity and error_rate instead of m and k as in the current implementation.
@jasondavies It might not be well fitted for a direct PR, and I'm not sure about my code, but it does the job AFAIK, and passes my tests successfully. Thanks for writing the original code !