spark
spark copied to clipboard
[SPARK-40248][SQL] Use larger number of bits to build Bloom filter
What changes were proposed in this pull request?
This PR makes Bloom filter join use larger number of bits to build Bloom filter if row count is exist.
Why are the changes needed?
To fix Bloom filter join cannot filter out more data when CBO is enabled. For example: TPC-DS q64:
CBO is enabled | CBO is disabled |
---|---|
![]() |
![]() |
![]() |
![]() |
After this PR:
Build bloom filter | Filter data |
---|---|
![]() |
![]() |
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Unit test.
cc @sigmod @cloud-fan
cc @andylam-db @yunxiaoma-db
cc @cloud-fan @sigmod
Merged to master.