InnerEye-DeepLearning
InnerEye-DeepLearning copied to clipboard
Multi-GPU jobs don't terminate when one worker fails
I thought that this had been fixed in PL, but it seems not. This job in RadiomicsNN raises an exception in one of the child processes, but the main process does not terminate: HD_a45c4cbd-1b83-44b4-bdd9-b76baf5a3547_4