bloom-filter-scala icon indicating copy to clipboard operation
bloom-filter-scala copied to clipboard

Stable Bloom filter

Open alexandrnikitin opened this issue 9 years ago • 2 comments
trafficstars

Stable Bloom filter

alexandrnikitin avatar Jun 29 '16 07:06 alexandrnikitin

Here are some alternative JVM based implementations as inspirations ;)

A. https://github.com/jjedele/stable-bloom-filter/blob/master/src/main/java/de/jjedele/sbf/StableBloomFilter.java

B. https://github.com/alexander-chervony/stable-bloom-filter/blob/master/src/main/scala/StableBloomFilter.scala

C. https://github.com/mayconbordin/streaminer/blob/master/src/main/java/org/streaminer/stream/membership/StableBloomFilter.java

Looking at these, seems like we could use the same approach as example A and use UnsafeBitArray instead of byte[]

kutchar avatar Sep 09 '16 18:09 kutchar

@kutchar Thanks for the info. BTW, @alexander-chervony is a colleague of mine and yes, they are all essentially ports of one lib. The problem with all of them is that there's no math from the paper. They all expect calculated values. One more crucial thing is that they use int to refresh an element which is huge waste of memory. While the recommendation is to pick the Max value as small as possible. "In practice, we limit our choice of Max to 1, 3 and 7 (if higher time cost can be tolerated, larger values of Max can be tried similarly)." Which is just few bits not 4 bytes.

I pushed some code ripped out from our private repo. https://github.com/alexandrnikitin/bloom-filter-scala/commit/867596602870d562061e936340e61da88c8da857 I really feel ashamed for doing that but I don't want to protract anymore. It's Java, not polished and not finished yet. The math is implemented in straightforward way from the original paper and very slooow, it needs some math love. I put an excerpt from the paper in the comments. The only way to use it at the moment is to adjust parameters by hand for your input.

alexandrnikitin avatar Sep 15 '16 08:09 alexandrnikitin