InnerEye-DeepLearning Multi-GPU jobs don't terminate when one worker fails

Multi-GPU jobs don't terminate when one worker fails

Open ant0nsc opened this issue 3 years ago • 0 comments

I thought that this had been fixed in PL, but it seems not. This job in RadiomicsNN raises an exception in one of the child processes, but the main process does not terminate: HD_a45c4cbd-1b83-44b4-bdd9-b76baf5a3547_4

AB#4699

Nov 17 '21 08:11 ant0nsc

InnerEye-DeepLearning InnerEye-DeepLearning copied to clipboard

Multi-GPU jobs don't terminate when one worker fails

InnerEye-DeepLearning
InnerEye-DeepLearning copied to clipboard