[spark] Support auto disable bucketed scan
Purpose
This pr adds a new rule DisableUnnecessaryPaimonBucketedScan to support auto disable bucketed scan if the bucket scan is not actually effective i.e., there is no shuffle exchange been removed. This change is to avoid performance regression since the bucketed scan may have smaller parallelism than normal scan.
For example: a table with bucket key x but user join/group-by/partition-by on column y.
Note, this rule is inspired from Spark DisableUnnecessaryBucketedScan but work for v2 scan.
Tests
Add test.
API and Format
no
Documentation
It seems spark test failed.
@JingsongLi thank you for the reminder, it took me a while to find the root cause...
@JingsongLi @YannByron do you have to take a look ? thank you
+1 Thanks @ulysses-you for the contribution. Merging...