dragonfly icon indicating copy to clipboard operation
dragonfly copied to clipboard

Add bloom/cuckoo filter support

Open e271828- opened this issue 2 years ago • 9 comments

Is your feature request related to a problem? Please describe. The redis bloom module is very popular. Probabilistic data structures are required for many problems.

Describe the solution you'd like Basic bloom / counting bloom support at minimum. Ideally full compatibility with redis bloom module.

Describe alternatives you've considered Could do lua bitfields, but bitfield isn't supported either and would be slower.

e271828- avatar May 22 '23 12:05 e271828-

@romange any plans on this?

e271828- avatar May 25 '23 20:05 e271828-

could you provide some context how you use this functionality and why you prefer using Dragonfly with this over Redis?

romange avatar May 25 '23 21:05 romange

could you provide some context how you use this functionality and why you prefer using Dragonfly with this over Redis?

Because sharding redis is inefficient in scenarios where scale-up would suffice.

If you're trying to be compatible with Redis, you'll need to either support modules or build the most popular ones into core. Probabilistic data structures have thousands of applications.

e271828- avatar May 26 '23 13:05 e271828-

Our architecture is entirely different that of Redis, so we can not support Redis Modules. Supporting Redis Bloom filter is currently a low priority and not scheduled for 2023.

romange avatar May 27 '23 13:05 romange

OK, thanks for letting us know. Sounds like dragonfly is not going to be a great fit for many use cases this year.

e271828- avatar May 28 '23 19:05 e271828-

I would like to see bloom filter support in dragonflydb as well. Can we reopen this?

jbergstroem avatar Mar 27 '24 14:03 jbergstroem

@romange FYI.

e271828- avatar Mar 27 '24 15:03 e271828-

The Bloom filter library is pretty big. Do you have specific APIs/commands in mind?

romange avatar Mar 27 '24 15:03 romange

The Bloom filter library is pretty big. Do you have specific APIs/commands in mind?

@romange Supporting basic Bloom commands would be a good start: BF.RESERVE, BF.ADD, BF.EXISTS, BF.MADD, BF.MEXISTS

I'd suggest adding a modern counting cuckoo filter as well:

CCF.RESERVE, CCF.ADD, CCF.DELETE, CCF.EXISTS, CCF.COUNT, CCF.MADD, CCF.MDELETE, CCF.MEXISTS, CCF.MCOUNT

These two together would cover 99% of use cases.

Bloom conformity test:

> BF.RESERVE bikes:models 0.001 1000000
OK
> BF.ADD bikes:models "Smoky Mountain Striker"
(integer) 1
> BF.EXISTS bikes:models "Smoky Mountain Striker"
(integer) 1
> BF.MADD bikes:models "Rocky Mountain Racer" "Cloudy City Cruiser" "Windy City Wippet"
1) (integer) 1
2) (integer) 1
3) (integer) 1
> BF.MEXISTS bikes:models "Rocky Mountain Racer" "Cloudy City Cruiser" "Windy City Wippet"
1) (integer) 1
2) (integer) 1
3) (integer) 1

Counting cuckoo conformity test:

> CCF.RESERVE bikes:inventory 1000000
OK
> CCF.ADD bikes:inventory "Smoky Mountain Striker"
(integer) 1
> CCF.EXISTS bikes:inventory "Smoky Mountain Striker" 
(integer) 1
> CCF.COUNT bikes:inventory "Smoky Mountain Striker"
(integer) 1
> CCF.ADD bikes:inventory "Smoky Mountain Striker"
(integer) 1
> CCF.COUNT bikes:inventory "Smoky Mountain Striker" 
(integer) 2
> CCF.DELETE bikes:inventory "Smoky Mountain Striker"
(integer) 1
> CCF.EXISTS bikes:inventory "Smoky Mountain Striker"
(integer) 1
> CCF.COUNT bikes:inventory "Smoky Mountain Striker"
(integer) 1
> CCF.DELETE bikes:inventory "Smoky Mountain Striker"
(integer) 1
> CCF.EXISTS bikes:inventory "Smoky Mountain Striker"
(integer) 0
> CCF.MADD bikes:inventory "Rocky Mountain Racer" "Cloudy City Cruiser" "Windy City Wippet"
1) (integer) 1
2) (integer) 1
3) (integer) 1
> CCF.MEXISTS bikes:inventory "Rocky Mountain Racer" "Cloudy City Cruiser" "Windy City Wippet"
1) (integer) 1
2) (integer) 1 
3) (integer) 1
> CCF.MCOUNT bikes:inventory "Rocky Mountain Racer" "Cloudy City Cruiser" "Windy City Wippet"
1) (integer) 1
2) (integer) 1
3) (integer) 1
> CCF.MDELETE bikes:inventory "Rocky Mountain Racer" "Cloudy City Cruiser"
1) (integer) 1
2) (integer) 1
> CCF.MEXISTS bikes:inventory "Rocky Mountain Racer" "Cloudy City Cruiser" "Windy City Wippet" 
1) (integer) 0
2) (integer) 0
3) (integer) 1

e271828- avatar Mar 27 '24 17:03 e271828-

bf.* are supported in v1.16.0. persistence support is coming for the next version

romange avatar Apr 03 '24 06:04 romange

@romange thanks! What about a counting filter? Can split this issue into a new one.

e271828- avatar Apr 09 '24 10:04 e271828-

@e271828- we currently do not have plans to implement the countring filter. We may consider contributions from the community.

romange avatar Apr 09 '24 12:04 romange

btw, we support hyper log log api.

romange avatar Apr 09 '24 12:04 romange

btw, we support hyper log log api.

@romange HLL cannot replace CBF/CCF. If you need membership, deletion, or exact counts (all common requirements) then you will still be out of luck. Here are benchmarks of modern sketches with tradeoffs. src

We may consider contributions from the community.

Maybe if you port to Rust ;)

e271828- avatar Apr 09 '24 12:04 e271828-

I know Professor Roy Fridman and I agree it's super interesting to implement those data structures. I guess we will have to wait for someone who is willing to use his c++ skills and contribute for this task 🤷🏼

romange avatar Apr 09 '24 15:04 romange