paimon icon indicating copy to clipboard operation
paimon copied to clipboard

[spark] Support auto disable bucketed scan

Open ulysses-you opened this issue 1 year ago • 1 comments

Purpose

This pr adds a new rule DisableUnnecessaryPaimonBucketedScan to support auto disable bucketed scan if the bucket scan is not actually effective i.e., there is no shuffle exchange been removed. This change is to avoid performance regression since the bucketed scan may have smaller parallelism than normal scan.

For example: a table with bucket key x but user join/group-by/partition-by on column y.

Note, this rule is inspired from Spark DisableUnnecessaryBucketedScan but work for v2 scan.

Tests

Add test.

API and Format

no

Documentation

ulysses-you avatar Aug 09 '24 08:08 ulysses-you

It seems spark test failed.

JingsongLi avatar Aug 11 '24 11:08 JingsongLi

@JingsongLi thank you for the reminder, it took me a while to find the root cause...

ulysses-you avatar Aug 12 '24 01:08 ulysses-you

@JingsongLi @YannByron do you have to take a look ? thank you

ulysses-you avatar Aug 12 '24 10:08 ulysses-you

+1 Thanks @ulysses-you for the contribution. Merging...

JingsongLi avatar Aug 16 '24 06:08 JingsongLi