nnUNet icon indicating copy to clipboard operation
nnUNet copied to clipboard

Traning hangs every 100s of epochs

Open claralebbos opened this issue 3 years ago • 1 comments

Dear Fabian, thank you for your code. I have a question about the training; I am training a nnUNet model in 5 folds, but every 100s of epochs, my training seems to hang (see attached images). The only workaround this is to continue the training every time it hangs, but this is making the training time much longer. Would you happen to know why this is happening?

Thank you.

Screenshot 2022-03-14 at 16 00 29 Screenshot 2022-03-14 at 16 00 54

claralebbos avatar Mar 14 '22 16:03 claralebbos

Hi, hard to say because that doesn't happen for me. What configuration are you training? And have you looked at your RAM? If the RAM gets full this can happen Best, Fabian

FabianIsensee avatar Mar 15 '22 11:03 FabianIsensee