spark
spark copied to clipboard
[SPARK-40017][SQL] Adaptive adjustment `spark.sql.adaptive.advisoryPartitionSizeInBytes`
What changes were proposed in this pull request?
This PR enhances CoalesceShufflePartitions to adaptive adjustment spark.sql.adaptive.advisoryPartitionSizeInBytes according to the ExpandExec operator that follows. For example:
SELECT
item.i_brand_id brand_id,
item.i_brand brand,
count(distinct ss_ext_sales_price),
count(distinct ss_addr_sk)
FROM store_sales, item
WHERE store_sales.ss_item_sk = item.i_item_sk
GROUP BY item.i_brand, item.i_brand_id
| Before | After |
|---|---|
![]() |
![]() |
Why are the changes needed?
Improve parallelism.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Unit test.
cc @cloud-fan
@maryannxue FYI
I'm afraid this is more of an ad-hoc change specific to one type of query or scenario. The heuristics might now work well with other cases.
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

