mpi-operator icon indicating copy to clipboard operation
mpi-operator copied to clipboard

Keep Launcher Alive to Retrieve Logs on Error

Open njriasan opened this issue 3 years ago • 0 comments

Hi I'm a relatively new user. I'm using my own dockerfile for a project that uses MPICH, and I have been able to successfully deploy the project. However, sometimes I'll be testing features changes and then I'll get an error which terminates the launcher.

I understand that this project is designed to automatically restart the launcher backoffLimit times, but even when I set the backoffLimit to 1, the launcher terminates so I am unable to retrieve the logs. What is the best way to retrieve the logs in this situation? Is there an option to avoid terminating the launcher once the number of failures have reached the backoff limit?

njriasan avatar Apr 20 '21 15:04 njriasan