Yuan Tang
Yuan Tang
> migrate the v2 implementations to the training operator Are you suggesting moving the entire codebase to training-operator? Or use mpi-operator as a library?
Sounds good
> there are many issues in the training operator (e.g. inconsistent job conditions, not using headless svc, and so on) Can you expand on this? This would be helpful for...
> We leave the individual mpi-operator, and the training-operator uses mpi-operatror as a library. It means that users can deploy MPIJob v2 as either part of the training operator or...
This would be a breaking change but I think it's relatively safe given how few people are using MXNet these days. cc @kubeflow/wg-training-leads
Any concerns of removing this? I don't think we want to continue supporting it.
Does anyone want to pick this up?
I don't think I'll have time to complete this. If you are interested in contributing, feel free to start a new PR.
Great. Thanks!
I'd like to get https://github.com/kubeflow/training-operator/pull/1953 merged as well. I think the risk is pretty low.