incubator-uniffle icon indicating copy to clipboard operation
incubator-uniffle copied to clipboard

[Improvement] support sequential unique block id

Open zhengchenyu opened this issue 2 years ago • 2 comments

Code of Conduct

Search before asking

  • [X] I have searched in the issues and found no similar issues.

What would you like to be improved?

The problem of block id overflow is described in #731, #1398. The block id is used to verify whether the accurate block set is obtained. I think we can get the sequence id from shuffle server. It will be almost impossible for overflow to occur. In map side, we can generate local block id. Then we report shuffle result. Shuffle will map the local block id to global sequential unique block id. In reduce side, we get shuffle result, then get the global sequential unique block id set. Since block id is sequential, we only need to pass the length of the bock set.

How should we improve?

No response

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

zhengchenyu avatar Dec 27 '23 08:12 zhengchenyu

This proposal has major changes to the API and needs to be fully discussed before development.

zhengchenyu avatar Dec 27 '23 08:12 zhengchenyu

For spark, the blockID is used to filter unused ids by upstream mapID

Could you help describe more about this design, how about drafting a proposal?

zuston avatar Dec 27 '23 09:12 zuston