Ethan Feng

Results 39 comments of Ethan Feng

If the rack configs are changed and the masters are restarted but workers didn't. This pr will get the wrong network locations until the workers are lost and register again.

In stress testing, did the master fail to resolve network locations?

> ping @FMX @SteNicholas @waitinfuture @turboFei PTAL thanks. I am not sure regarding the dependency check issues failing above, I ran `./dev/dependencies.sh --replace` , but it is still failing. Is...

Thanks. Merged into main(v0.6.0).

According to your Jira ticket, "that shuffle fetch fails does not lead to stage fail because task speculation and another attempts succeed", I think the quoted scenario should not happen...

> > According to your Jira ticket, "that shuffle fetch fails does not lead to stage fail because task speculation and another attempts succeed", I think the quoted scenario should...

> > > > According to your Jira ticket, "that shuffle fetch fails does not lead to stage fail because task speculation and another attempts succeed", I think the quoted...

@buska88 It would be better to check the validity of a shuffle Id after you get it instead of using a shuffle ID that marked as invalid. You can add...

I've updated the protobuf names, and there are more changes needed in this PR. You should change `HAHelper.convertByteStringToRequest` and `HAHelper.convertRequestToByteString` to use the meta request structures. As we have discussed...