nice_pytorch icon indicating copy to clipboard operation
nice_pytorch copied to clipboard

Validation Loss Statistics: min=nan, med=nan, mean=nan, max=nan

Open sun2009ban opened this issue 5 years ago • 7 comments

Thank you for the code! When I ran the code on mnist like this python train.py --dataset mnist, I got the output Validation Loss Statistics: min=nan, med=nan, mean=nan, max=nan after training for several epochs. default

Please help me, I have no idea what the matter is.

sun2009ban avatar Jan 15 '19 01:01 sun2009ban

I am having the same issue. Any fixes for this?

phongnhhn92 avatar Mar 05 '19 17:03 phongnhhn92

image The model collapse very quickly with exploded loss. I want to figure out why.

ranery avatar Mar 22 '19 07:03 ranery

It seems nothing to do with initialization methods.

ranery avatar Mar 22 '19 08:03 ranery

I found it! the adam parameter beta_2 should never be 0.01 :)

ranery avatar Mar 22 '19 08:03 ranery

I found it! the adam parameter beta_2 should never be 0.01 :)

Still collapse after few epochs :(

ranery avatar Mar 22 '19 10:03 ranery

image Seems like collapse to one mode after 2/3,000 iterations, is here anybody can give a reason?

ranery avatar Mar 22 '19 14:03 ranery

image Seems like collapse to one mode after 2/3,000 iterations, is here anybody can give a reason?

The norm of h, i.e., f(x) gets bigger and bigger as training evolves, leading to the torch.exp(h) term approaches to Infinity. I looked into the model parameters and found the norm of them also increased a lot in training. One way to mitigate the issue is by using L1 regularization. It would also help if you use F.softplus to make the logistic prior calculation stable.

leviszhang avatar Jun 19 '20 03:06 leviszhang