incubator-uniffle icon indicating copy to clipboard operation
incubator-uniffle copied to clipboard

[Improvement] Client compression optimization

Open zuston opened this issue 9 months ago • 1 comments

Code of Conduct

Search before asking

  • [x] I have searched in the issues and found no similar issues.

What would you like to be improved?

From many spark jobs inspection, I found the large part of client shuffle write is the compression time. Please see the following log.

2025-06-04 16:45:45,934 INFO writer.RssShuffleWriter: Finish write shuffle for appId[application_1710209318993_131126136_1749024542724], shuffleId[302], taskId[28290_0] with write 25645 ms, include checkSendResult[68], commit[0], WriteBufferManager cost copyTime[1725], writeTime[25187], serializeTime[950], sortTime[0], estimateTime[0], requireMemoryTime[135], uncompressedDataLen[8176523436], compressedDataLen[2839134582], compressTime[18677], compressRatio[2.8799353]

From the uniffle code base, maybe the major cost of compression is the buffer memory allocation.

How should we improve?

No response

Are you willing to submit PR?

  • [x] Yes I am willing to submit a PR!

zuston avatar Jun 05 '25 02:06 zuston

cc @xianjingfeng @jerqi

zuston avatar Jun 05 '25 02:06 zuston

Benefiting from the overlapping compression mechanism, the shuffle write speed has been significantly improved, achieving a 20% increase. Below is the average speed for the entire Spark job.

Image

This should be enabled by default, PTAL @xianjingfeng @jerqi

zuston avatar Aug 13 '25 03:08 zuston

Benefiting from the overlapping compression mechanism, the shuffle write speed has been significantly improved, achieving a 20% increase. Below is the average speed for the entire Spark job.

Image This should be enabled by default, PTAL [@xianjingfeng](https://github.com/xianjingfeng) [@jerqi](https://github.com/jerqi)

OK for me.

jerqi avatar Aug 13 '25 03:08 jerqi

All done. Close this

zuston avatar Aug 25 '25 03:08 zuston