incubator-uniffle icon indicating copy to clipboard operation
incubator-uniffle copied to clipboard

[FEATURE][Spark] Support partial sort and combine for reducing shuffle data size

Open zuston opened this issue 3 years ago • 14 comments

Code of Conduct

Search before asking

  • [X] I have searched in the issues and found no similar issues.

Describe the feature

In spark client, currently uniffle don't support merge for some ops supporting combine to reduce data size. Due to this, in some cases, it will cause unnecessary network and performance regression.

Motivation

No response

Describe the solution

No response

Additional context

No response

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

zuston avatar Dec 23 '22 02:12 zuston

I encounter this problem in our online jobs, but I don't dig into it. Anyway, from my side, this feature is an improvement.

cc @xianjingfeng @advancedxy @jerqi

zuston avatar Dec 23 '22 02:12 zuston

You mean map side combine?

Well, last time I checked, uniffle don't do any combine, just create a combined value for each record.

If we are going to support this feature, maybe a lot of client side code should be rewrote...😇

advancedxy avatar Dec 23 '22 03:12 advancedxy

You mean map side combine?

Yes

If we are going to support this feature, maybe a lot of client side code should be rewrote...😇

Yes. 🙆

zuston avatar Dec 23 '22 04:12 zuston

we also have this demand. I met in hudi table which will use countByKey to evaluation work load.

KnightChess avatar Feb 04 '24 04:02 KnightChess

any plan to support it?

KnightChess avatar Feb 04 '24 04:02 KnightChess

any plan to support it?

haven't. If you want this, feel free to discuss more

zuston avatar Feb 04 '24 05:02 zuston

how about use ExternalAppendOnlyMap to archive it, but it's may spill to disk, we can add a param to controll it. And I think if map side can be combine, it will reduce shuffle data and reduce data skew problems

KnightChess avatar Feb 04 '24 12:02 KnightChess

ExternalAppendOnlyMap is used to handle huge data, I think it is not necessary. We can sort and combine in memory, then serialize the records. I did it in #1239 (developing) by this way. But I bind only the use of the remote merge feature.

I afraid that the proportion of being combined may be not high, since we use pipelined shuffle in rss, only partial record will be combined.

zhengchenyu avatar Feb 04 '24 12:02 zhengchenyu

The client and server side merge are all needed.

zuston avatar Feb 05 '24 01:02 zuston

I think the merging should work on the client side, not on the RSS server. It maximizes the utilization of executor resources, rather than placing the computation burden on the RSS server, whose resources are shared and need to ensure stability and the normal operation of core services.

KnightChess avatar Feb 06 '24 07:02 KnightChess

And furthermore, I think the RSS server is best left to handle data transmission efficiently without any content processing. It should return whatever it receives from user tasks. All data content processing is preferably performed by the executors, allowing the RSS server to focus solely on efficient shuffle transmission. What do you think?

KnightChess avatar Feb 06 '24 07:02 KnightChess

@KnightChess

If you think RSS sever should not handle any content, you can save the shuffle date to 3rd remote filesystem, and read the shuffle data from this filesystem.

What is your purpose of using remote shuffle service?

For me, the main purpose for remote merge in shuffle side is that application will not use any local file as shuffle storage, then we can realize the separation of storage and computing, then support hybrid deployment.

Then Remote merge is optional function. And If we merge on server side, we can reduce the resource usage of executor, then add the resource to shuffle server. That's not a problem.

zhengchenyu avatar Feb 06 '24 07:02 zhengchenyu

I think the merging should work on the client side, not on the RSS server.

I think the zheng's design motivation is not same with you @KnightChess , if you want this, client merge design is OK for me, that is not conflict with remote merge. Right @zhengchenyu

zuston avatar Feb 06 '24 08:02 zuston

Well, yes. I'm just offering a suggestion, sort in memory. There is no conflict!

zhengchenyu avatar Feb 06 '24 08:02 zhengchenyu