EmoNeXt icon indicating copy to clipboard operation
EmoNeXt copied to clipboard

train loss and val loss is nan and the val acc and train val becomes 13% and it does not change for subsequent trainings

Open Codenewwer opened this issue 10 months ago • 1 comments

Hello, I am trying to replicate the experiment using the "tiny" model. I downloaded the dataset and trained the model as per the steps provided. But at 178th epoch there is a case where the train loss and val loss is nan and the val acc and train val becomes 13% and it does not change for subsequent trainings,Any idea/guidance? Best! train

Codenewwer avatar Apr 01 '24 02:04 Codenewwer

I would have similar things happen sometimes (vanishing/exploding gradient.) IIRC I just restarted training with the same parameters and it usually worked fine on my next attempt (which didn't make much sense to me because I should be using the same random seed, but ¯_(ツ)_/¯.) You could also try lowering your learning rate if you're still trying to fix your problem.

Enspire-Academy avatar Apr 30 '24 00:04 Enspire-Academy