databend icon indicating copy to clipboard operation
databend copied to clipboard

Feature: Try Support BloomFilter Collision

Open JackTan25 opened this issue 1 year ago • 2 comments

Summary We can use runtime filter's bloom index to do collision with parquet block's bloom filter index to do prune in storage level. When do parquet reading, we can improve filter chances.

JackTan25 avatar Mar 13 '24 07:03 JackTan25

cc @dantengsky

JackTan25 avatar Mar 13 '24 07:03 JackTan25

https://openproceedings.org/2023/conf/edbt/paper-190.pdf for https://github.com/datafuselabs/databend/pull/14970, we find out that in some cases, the false positive is very high, so we can't prune blocks as expected. We introduce BloomRF to solve this which is newer than surf paper.

JackTan25 avatar Apr 01 '24 06:04 JackTan25