spark [SPARK-49386][SPARK-27734][CORE][SQL] Add memory based thresholds for shuffle spill

Original author: @amuraru

What changes were proposed in this pull request?

This PR aims to support add memory based thresholds for shuffle spill.

Introduce configuration

spark.shuffle.spill.maxRecordsSizeForSpillThreshold
spark.sql.windowExec.buffer.spill.size.threshold
spark.sql.sessionWindow.buffer.spill.size.threshold
spark.sql.sortMergeJoinExec.buffer.spill.size.threshold
spark.sql.cartesianProductExec.buffer.spill.size.threshold

Why are the changes needed?

https://github.com/apache/spark/pull/24618

We can only determine the number of spills by configuring spark.shuffle.spill.numElementsForceSpillThreshold. In some scenarios, the size of a row may be very large in the memory.

Does this PR introduce any user-facing change?

No

How was this patch tested?

GA

Verified in the production environment, the task time is shortened, the number of spill disks is reduced, there is a better chance to compress the shuffle data, and the size of the spill to disk is also significantly reduced.

Current

24/08/19 07:02:54,947 [Executor task launch worker for task 0.0 in stage 53.0 (TID 1393)] INFO ShuffleExternalSorter: Thread 126 spilling sort data of 62.0 MiB to disk (11490  times so far)
24/08/19 07:02:55,029 [Executor task launch worker for task 0.0 in stage 53.0 (TID 1393)] INFO ShuffleExternalSorter: Thread 126 spilling sort data of 62.0 MiB to disk (11491  times so far)
24/08/19 07:02:55,093 [Executor task launch worker for task 0.0 in stage 53.0 (TID 1393)] INFO ShuffleExternalSorter: Thread 126 spilling sort data of 62.0 MiB to disk (11492  times so far)
24/08/19 07:08:59,894 [Executor task launch worker for task 0.0 in stage 53.0 (TID 1393)] INFO Executor: Finished task 0.0 in stage 53.0 (TID 1393). 7409 bytes result sent to driver

PR

Was this patch authored or co-authored using generative AI tooling?

No

Aug 23 '24 09:08 cxzl25

Let's probably file a new JIRA

Aug 26 '24 02:08 HyukjinKwon

Gentle ping, @cxzl25 and @mridulm .

Although we have enough time until Feature Freeze, I'm wondering if we can deliver this via Apache Spark 4.0.0-preview2 RC1 (next Monday). WDYT?

Sep 11 '24 15:09 dongjoon-hyun

I am a bit swamped unfortunately, and I dont think I will be able to ensure this gets merged before next monday @dongjoon-hyun - sorry about that :-(

@cxzl25, will try to get around to reviewing this soon - apologies for the delay

Sep 12 '24 05:09 mridulm

+CC @Ngone51 as well.

Sep 12 '24 05:09 mridulm

Thank you for letting me know, @mridulm ~ No problem at all.

Sep 12 '24 15:09 dongjoon-hyun

Kindly ping @mridulm, do you have a chance to take another look? I also found this PR is helpful for stability for jobs that spill huge data.

Apr 18 '25 10:04 pan3793

I am planning to merge this next week if there are no concerns @cloud-fan , @dongjoon-hyun. It has been open for quite a while, and is a very helpful fix to mitigate memory issues.

I am not super keen on the naming of some of the sql configs, would your thoughts on that (as well as rest of the PR).

Also, +CC @attilapiros for feedback as well.

May 03 '25 03:05 mridulm

@mridulm @cxzl25 @attilapiros @HyukjinKwon @pan3793

Hi all was just curious if there was any issues regarding this pr or if it will be merged in OSS Spark sometime soon? Thanks again for making this change!

Jun 24 '25 21:06 rahil-c

I did not merge it given @attilapiros was actively reviewing it. Are there any other concerns/comments on this Attila ?

Jun 25 '25 17:06 mridulm

checking

Jun 25 '25 17:06 attilapiros

If the current changes look good, can you merge it pls @attilapiros ? I am travelling and dont have access to my desktop :)

Jul 03 '25 04:07 mridulm

Thank you! @mridulm @attilapiros @cxzl25 , looking forward to this change in coming spark release.

Jul 03 '25 17:07 rahil-c

Merged to master.

Jul 04 '25 21:07 attilapiros

Let's probably file a new JIRA

@HyukjinKwon Can I close the old jira (https://issues.apache.org/jira/browse/SPARK-27734) as a duplicate or what was your plan when you asked for a new ticket?

Jul 04 '25 21:07 attilapiros

spark spark copied to clipboard

[SPARK-49386][SPARK-27734][CORE][SQL] Add memory based thresholds for shuffle spill

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

spark
spark copied to clipboard