python-bloomfilter icon indicating copy to clipboard operation
python-bloomfilter copied to clipboard

Added nstar and nstar_intersection functions as well as tests.

Open adewes opened this issue 9 years ago • 7 comments

I've implemented two functions nstar and nstar_intersection that estimate the number of elements in a Bloom filter as well as in an intersection of two Bloom filters (a separate function is required for the latter case since using nstar() on the intersection directly would yield a wrong result). This is very useful in many circumstances, e.g. when performing similarity searches on hierarchical data structures.

The implementation follows the Wikipedia article:

http://en.wikipedia.org/wiki/Bloom_filter

I've added tests and a short explanation as well, please let me know if you need anything else in order to merge this.

adewes avatar Apr 01 '15 17:04 adewes

btw @jaybaird thanks for this project, it's really useful :)

adewes avatar Apr 01 '15 17:04 adewes

Andreas,

Maybe I'm misunderstanding what this does, but why would you use this over using len? Or in the case of the intersection, doing the intersection and taking the len() there?

I do see now actually that we don't update the count when we union/intersect filters, which is a bug and might better be solved by this. Can you point me to the relevant section in the Wikipedia article you're referencing so I can catch up?

Thanks!

jaybaird avatar Apr 01 '15 18:04 jaybaird

Nevermind, I found it :)

jaybaird avatar Apr 01 '15 18:04 jaybaird

Hey @jaybaird, the purpose of nstar and nstar_intersection is to provide an estimate of the number of elements in the filter, which is useful if you reconstruct the Bloom filter from the bit array alone (and thus not have access to the count attribute). Also, obtaining a count for an intersection or union of two Bloom filters is not possible using the count attribute, so we need to estimate it.

adewes avatar Apr 01 '15 21:04 adewes

@jaybaird is there anything you need from my side to merge this PR or come to a decision on whether to integrate this into the code base?

adewes avatar Apr 08 '15 15:04 adewes

I plan on taking a look at this this weekend. I'm still not 100% sure it's utility and there's a larger bug that I realized needs to be fixed but this may be the solution for. I'll keep you posted.

jaybaird avatar Apr 10 '15 18:04 jaybaird

hey @jaybaird , still planning to merge this?

adewes avatar Aug 25 '15 20:08 adewes