Shuang

Results 32 comments of Shuang

Merge to main(v0.6.0). Could you also backport this pr to branch-0.5 @littlexyw ?

During Celeborn worker upgrades, the Flink client is likely to encounter connection exceptions, which can easily lead to upstream task reruns. This may be very costly. If Flink side wants...

> Gentle ping @RexXiong @SteNicholas Sorry for the late reply. I think we can add a switch, defaulting to false, which will allow us to maintain the current behavior while...

> This PR can be closed for the features has already been implemented by [CELEBORN-1955](https://issues.apache.org/jira/projects/CELEBORN/issues/CELEBORN-1955) and [CELEBORN-1962](https://issues.apache.org/jira/projects/CELEBORN/issues/CELEBORN-1962). Now one can configure nodeSelector/tolerations by `master.nodeSelector`/`worker.nodeSelector` and `master.tolerations`/`worker.tolerations`. Thanks @jesusch, based on...

> anymore comments for this? > > ping @AngersZhuuuu @jiang13021 @RexXiong This change forces new client to be incompatible with older workers, which doesn't seem particularly necessary. Therefore, I will...

I don't think this change is quite right. For example, if attempts 0 and 1 have already failed, attempt 2 is running, and attempt 3 reports failed, according to the...

> ping @waitinfuture @RexXiong Please take a look. Thanks. see comments at https://github.com/apache/celeborn/pull/2609