spark
spark copied to clipboard
[SPARK-49386][SPARK-27734][CORE][SQL] Add memory based thresholds for shuffle spill
Original author: @amuraru
What changes were proposed in this pull request?
This PR aims to support add memory based thresholds for shuffle spill.
Introduce configuration
- spark.shuffle.spill.maxRecordsSizeForSpillThreshold
- spark.sql.windowExec.buffer.spill.size.threshold
- spark.sql.sessionWindow.buffer.spill.size.threshold
- spark.sql.sortMergeJoinExec.buffer.spill.size.threshold
- spark.sql.cartesianProductExec.buffer.spill.size.threshold
Why are the changes needed?
https://github.com/apache/spark/pull/24618
We can only determine the number of spills by configuring spark.shuffle.spill.numElementsForceSpillThreshold. In some scenarios, the size of a row may be very large in the memory.
Does this PR introduce any user-facing change?
No
How was this patch tested?
GA
Verified in the production environment, the task time is shortened, the number of spill disks is reduced, there is a better chance to compress the shuffle data, and the size of the spill to disk is also significantly reduced.
Current
24/08/19 07:02:54,947 [Executor task launch worker for task 0.0 in stage 53.0 (TID 1393)] INFO ShuffleExternalSorter: Thread 126 spilling sort data of 62.0 MiB to disk (11490 times so far)
24/08/19 07:02:55,029 [Executor task launch worker for task 0.0 in stage 53.0 (TID 1393)] INFO ShuffleExternalSorter: Thread 126 spilling sort data of 62.0 MiB to disk (11491 times so far)
24/08/19 07:02:55,093 [Executor task launch worker for task 0.0 in stage 53.0 (TID 1393)] INFO ShuffleExternalSorter: Thread 126 spilling sort data of 62.0 MiB to disk (11492 times so far)
24/08/19 07:08:59,894 [Executor task launch worker for task 0.0 in stage 53.0 (TID 1393)] INFO Executor: Finished task 0.0 in stage 53.0 (TID 1393). 7409 bytes result sent to driver
PR
Was this patch authored or co-authored using generative AI tooling?
No
Let's probably file a new JIRA
Gentle ping, @cxzl25 and @mridulm .
Although we have enough time until Feature Freeze, I'm wondering if we can deliver this via Apache Spark 4.0.0-preview2 RC1 (next Monday). WDYT?
I am a bit swamped unfortunately, and I dont think I will be able to ensure this gets merged before next monday @dongjoon-hyun - sorry about that :-(
@cxzl25, will try to get around to reviewing this soon - apologies for the delay
+CC @Ngone51 as well.
Thank you for letting me know, @mridulm ~ No problem at all.
Kindly ping @mridulm, do you have a chance to take another look? I also found this PR is helpful for stability for jobs that spill huge data.
I am planning to merge this next week if there are no concerns @cloud-fan , @dongjoon-hyun. It has been open for quite a while, and is a very helpful fix to mitigate memory issues.
I am not super keen on the naming of some of the sql configs, would your thoughts on that (as well as rest of the PR).
Also, +CC @attilapiros for feedback as well.
@mridulm @cxzl25 @attilapiros @HyukjinKwon @pan3793
Hi all was just curious if there was any issues regarding this pr or if it will be merged in OSS Spark sometime soon? Thanks again for making this change!
I did not merge it given @attilapiros was actively reviewing it. Are there any other concerns/comments on this Attila ?
checking
If the current changes look good, can you merge it pls @attilapiros ? I am travelling and dont have access to my desktop :)
Thank you! @mridulm @attilapiros @cxzl25 , looking forward to this change in coming spark release.
Merged to master.
Let's probably file a new JIRA
@HyukjinKwon Can I close the old jira (https://issues.apache.org/jira/browse/SPARK-27734) as a duplicate or what was your plan when you asked for a new ticket?