Shuang
Shuang
gentle ping @FMX
Merge to main(v0.6.0). Could you also backport this pr to branch-0.5 @littlexyw ?
ping @reswqa @codenohup
During Celeborn worker upgrades, the Flink client is likely to encounter connection exceptions, which can easily lead to upstream task reruns. This may be very costly. If Flink side wants...
> Gentle ping @RexXiong @SteNicholas Sorry for the late reply. I think we can add a switch, defaulting to false, which will allow us to maintain the current behavior while...
> This PR can be closed for the features has already been implemented by [CELEBORN-1955](https://issues.apache.org/jira/projects/CELEBORN/issues/CELEBORN-1955) and [CELEBORN-1962](https://issues.apache.org/jira/projects/CELEBORN/issues/CELEBORN-1962). Now one can configure nodeSelector/tolerations by `master.nodeSelector`/`worker.nodeSelector` and `master.tolerations`/`worker.tolerations`. Thanks @jesusch, based on...
> anymore comments for this? > > ping @AngersZhuuuu @jiang13021 @RexXiong This change forces new client to be incompatible with older workers, which doesn't seem particularly necessary. Therefore, I will...
I don't think this change is quite right. For example, if attempts 0 and 1 have already failed, attempt 2 is running, and attempt 3 reports failed, according to the...
> ping @waitinfuture @RexXiong Please take a look. Thanks. see comments at https://github.com/apache/celeborn/pull/2609
Thanks. merge to main(v0.6.0)