spark [SPARK-40017][SQL] Adaptive adjustment `spark.sql.adaptive.advisoryPartitionSizeInBytes`

[SPARK-40017][SQL] Adaptive adjustment `spark.sql.adaptive.advisoryPartitionSizeInBytes`

Open wangyum opened this issue 3 years ago • 2 comments

What changes were proposed in this pull request?

This PR enhances CoalesceShufflePartitions to adaptive adjustment spark.sql.adaptive.advisoryPartitionSizeInBytes according to the ExpandExec operator that follows. For example:

SELECT                                          
  item.i_brand_id brand_id,                     
  item.i_brand brand,                           
  count(distinct ss_ext_sales_price),          
  count(distinct ss_addr_sk)                   
FROM store_sales, item                          
WHERE store_sales.ss_item_sk = item.i_item_sk
GROUP BY item.i_brand, item.i_brand_id

Before	After

Why are the changes needed?

Improve parallelism.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test.

Aug 09 '22 09:08 wangyum

cc @cloud-fan

Aug 09 '22 13:08 wangyum

@maryannxue FYI

Aug 10 '22 00:08 HyukjinKwon

I'm afraid this is more of an ad-hoc change specific to one type of query or scenario. The heuristics might now work well with other cases.

Aug 17 '22 16:08 maryannxue

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

Nov 27 '22 00:11 github-actions[bot]

spark spark copied to clipboard

[SPARK-40017][SQL] Adaptive adjustment `spark.sql.adaptive.advisoryPartitionSizeInBytes`

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

spark
spark copied to clipboard