[Improvement] Client compression optimization
Code of Conduct
- [x] I agree to follow this project's Code of Conduct
Search before asking
- [x] I have searched in the issues and found no similar issues.
What would you like to be improved?
From many spark jobs inspection, I found the large part of client shuffle write is the compression time. Please see the following log.
2025-06-04 16:45:45,934 INFO writer.RssShuffleWriter: Finish write shuffle for appId[application_1710209318993_131126136_1749024542724], shuffleId[302], taskId[28290_0] with write 25645 ms, include checkSendResult[68], commit[0], WriteBufferManager cost copyTime[1725], writeTime[25187], serializeTime[950], sortTime[0], estimateTime[0], requireMemoryTime[135], uncompressedDataLen[8176523436], compressedDataLen[2839134582], compressTime[18677], compressRatio[2.8799353]
From the uniffle code base, maybe the major cost of compression is the buffer memory allocation.
How should we improve?
No response
Are you willing to submit PR?
- [x] Yes I am willing to submit a PR!
cc @xianjingfeng @jerqi
Benefiting from the overlapping compression mechanism, the shuffle write speed has been significantly improved, achieving a 20% increase. Below is the average speed for the entire Spark job.
This should be enabled by default, PTAL @xianjingfeng @jerqi
Benefiting from the overlapping compression mechanism, the shuffle write speed has been significantly improved, achieving a 20% increase. Below is the average speed for the entire Spark job.
This should be enabled by default, PTAL [@xianjingfeng](https://github.com/xianjingfeng) [@jerqi](https://github.com/jerqi)
OK for me.
All done. Close this
This should be enabled by default, PTAL [@xianjingfeng](https://github.com/xianjingfeng) [@jerqi](https://github.com/jerqi)