bloomfilter.js icon indicating copy to clipboard operation
bloomfilter.js copied to clipboard

Use optimal m/k given n/p option

Open Glench opened this issue 13 years ago • 3 comments
trafficstars

I was just wondering why you don't have an option to use the optimal m and k parameters (see here) based on n and p.

There are a couple ways I imagine this going:

  • Add an optional 3rd argument in the constructor that when present and set to true would interpret the first and second values of the constructor as n and p instead of m and k.
  • Change the constructor to accept a hash of either {m: blah, k: blah} or {n: blah, k: blah} (hopefully with better names than the single letters.
  • Add a new function, something like BloomFilter.useOptimal(n, p).

I could implement any of these, but I wanted to ask you first if there was any reason why you didn't do this.

Glench avatar Jan 05 '12 20:01 Glench

Great idea! I originally didn't include n and p as I was inspired by https://github.com/willf/bloom, which includes an empirical false positive estimator rather than relying on theory. I'm not entirely sure why he does this, but it's probably because FNK isn't as good as a theoretically perfect hash function (although it is very fast and hence "good enough" for bloom filters).

I plan on making the hash function configurable anyway, so I'll add support for this at the same time.

The other idea I had was the ability to set a fixed false positive rate, and have the bit array automatically grow (perhaps by doubling each time) to keep the rate below a fixed value.

jasondavies avatar Jan 05 '12 21:01 jasondavies

Ah, I see. Excellent!

Apparently this guy implemented the latter idea you mentioned. I haven't looked at his implementation, but maybe you can take inspiration/code from there :)

Glench avatar Jan 05 '12 21:01 Glench

I implemented such an approach on a fork of mine : https://github.com/Phyks/bloomfilter.js

This uses capacity and error_rate instead of m and k as in the current implementation.

@jasondavies It might not be well fitted for a direct PR, and I'm not sure about my code, but it does the job AFAIK, and passes my tests successfully. Thanks for writing the original code !

Phyks avatar Oct 27 '14 21:10 Phyks