nnUNet icon indicating copy to clipboard operation
nnUNet copied to clipboard

The train_loss and val_loss synchronous rise.

Open CodeHarcourt opened this issue 10 months ago • 4 comments

I meet a new problem which is the train_loss and the val_loss. They are too high. The train_loss and val_loss synchronous rise. This means the dataset may need to be data cleaned or change the structure of the net. In the five epochs the terminal shows Current learning rate: 0.00995 train_loss -0.7964 val_loss -0.7616 Pseudo dice [0.7497, 0.8259, 0.8752] Epoch time: 154.69 s Yayy! New best EMA pseudo Dice: 0.7575 and the first epoch Current learning rate: 0.01 train_loss -0.3213 val_loss -0.6368 Pseudo dice [0.5874, 0.7518, 0.8273] Epoch time: 168.24 s Yayy! New best EMA pseudo Dice: 0.7221 I used your method to reset the data type of the labelsTr. But maybe something was wrong. When I train 50 epochs I find that the train_loss and val_loss Synchronous ascent? Can anyone give me some advice?please!

CodeHarcourt avatar Mar 26 '24 11:03 CodeHarcourt

Please share your progress.png file

FabianIsensee avatar Mar 26 '24 15:03 FabianIsensee

Ok, em, I haven't saved the results of that training so maybe I need to try it once again. But the last time, the results of the train show the train_loss and val_loss synchronous rise.

CodeHarcourt avatar Mar 27 '24 04:03 CodeHarcourt

Here! You can see the Epoch from 0 to 8! github

CodeHarcourt avatar Mar 27 '24 12:03 CodeHarcourt

Here! You can see the Epoch from 9 to 20! But the results of the training show the train_loss and val_loss synchronous rise. And the dice score is rising! github

CodeHarcourt avatar Mar 27 '24 13:03 CodeHarcourt

Hi @CodeHarcourt, it seems to me as if your train loss is not steadily increasing but actually decreasing as the values are generally becoming more negative. Train loss epoch 0: ~ -0.2 Train loss epoch 20: ~ -0.6

Similarly for the validation loss.

sten2lu avatar May 18 '24 17:05 sten2lu