spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-40248][SQL] Use larger number of bits to build Bloom filter

Open wangyum opened this issue 2 years ago • 2 comments

What changes were proposed in this pull request?

This PR makes Bloom filter join use larger number of bits to build Bloom filter if row count is exist.

Why are the changes needed?

To fix Bloom filter join cannot filter out more data when CBO is enabled. For example: TPC-DS q64:

CBO is enabled CBO is disabled
image image
image image

After this PR:

Build bloom filter Filter data
image image

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test.

wangyum avatar Aug 28 '22 13:08 wangyum

cc @sigmod @cloud-fan

wangyum avatar Aug 29 '22 01:08 wangyum

cc @andylam-db @yunxiaoma-db

sigmod avatar Sep 06 '22 16:09 sigmod

cc @cloud-fan @sigmod

wangyum avatar Oct 21 '22 04:10 wangyum

Merged to master.

wangyum avatar Nov 02 '22 10:11 wangyum