connect icon indicating copy to clipboard operation
connect copied to clipboard

DRAFT Add probabilistic in-memory caches bloom and cuckoo for dedup/cached with sharding

Open peczenyj opened this issue 2 years ago • 1 comments

I add support to two probabilistic filters (bloom and cuckoo) that can be used as cache on dedupe processor

optionally, such caches can dump and restore the state from filesystem. useful to avoid start the daemon without any data

Motivation: I need to perform a dedupe on a huge amount of data, and this pull request may save me memory / resources

this MR can be combined with https://github.com/benthosdev/benthos/pull/2123 in the future

peczenyj avatar Aug 04 '23 15:08 peczenyj

@mihaitodor do you think it is ok now ? @Jeffail what do you think?

peczenyj avatar Sep 05 '23 17:09 peczenyj