Junfan Zhang

Results 434 comments of Junfan Zhang

After rethinking this, I think the `reassignAllShuffleServersForWholeStage` could be invoked by the retry writer rather than previous failed writer that could ensure no older data into server after re-register.

> It's dangerous to delete the failed data of the stage when we retry. It's hard to reach the condition to delete the data. Could you describe more?

> > > It's dangerous to delete the failed data of the stage when we retry. It's hard to reach the condition to delete the data. > > > >...

Could you help review this? @EnricoMi @jerqi spark2 change will be finished after this PR is OK for you

> 1. How to reject the legacy requests? Using the latest attemtp id in server side to check whether the send request is valid with the older version, this will...

> Can we register a shuffle as the tuple `(shuffle_id, stage_attempt_id)`? This way, we do not need to wait for `(shuffle_id, 0)` to be be deleted synchronously, and can go...

> Spark client can easily come up with a per-stage-attempt shuffle id and feed that to the shuffle server. That should not require any server-side refactoring. Thanks for your review....

> > > > Spark client can easily come up with a per-stage-attempt shuffle id and feed that to the shuffle server. That should not require any server-side refactoring. >...

> > > If we make the unique shuffleIdWithAttemptNo generated or converted in server side > > > > > > I presume the server side does not know about...