Aldo Culquicondor

Results 1226 comments of Aldo Culquicondor

Can I do a pull request for this or is there still a decision to be made?

cc @gaocegege @terrytangyuan @ahg-g @kawych

You can definitely use `restartPolicy: OnFailure`

The default in k8s is `Always`, which is not allowed for Jobs. I guess having `OnFailure` for MPIJob is fair. Do you mind sending a PR for that?

Yes. The change is not backwards compatible, so we can't do it for older versions.

@xhejtman are you still working on that PR? I think that would be the only change pending before we can release v2.

Yes, that's my fork. Do it for this repository

Alternatively, you can adapt the entry-point that we have for Intel (Intel doesn't do retries, so it's absolutely necessary to wait). https://github.com/kubeflow/mpi-operator/blob/master/examples/base/intel-entrypoint.sh

Can you add the rationale for the donation in the description? Also it would be good to add what's the high-level plan for kubeflow to use the kubernetes-sig/mpi-operator in their...

@omesser were there any scripts you used to generate the original chart?