nice_pytorch Validation Loss Statistics: min=nan, med=nan, mean=nan, max=nan

Validation Loss Statistics: min=nan, med=nan, mean=nan, max=nan

Open sun2009ban opened this issue 6 years ago • 7 comments

Thank you for the code! When I ran the code on mnist like this python train.py --dataset mnist, I got the output Validation Loss Statistics: min=nan, med=nan, mean=nan, max=nan after training for several epochs. default

Please help me, I have no idea what the matter is.

Jan 15 '19 01:01 sun2009ban

I am having the same issue. Any fixes for this?

Mar 05 '19 17:03 phongnhhn92

The model collapse very quickly with exploded loss. I want to figure out why.

Mar 22 '19 07:03 ranery

It seems nothing to do with initialization methods.

Mar 22 '19 08:03 ranery

I found it! the adam parameter beta_2 should never be 0.01 :)

Mar 22 '19 08:03 ranery

I found it! the adam parameter beta_2 should never be 0.01 :)

Still collapse after few epochs :(

Mar 22 '19 10:03 ranery

Seems like collapse to one mode after 2/3,000 iterations, is here anybody can give a reason?

Mar 22 '19 14:03 ranery

Seems like collapse to one mode after 2/3,000 iterations, is here anybody can give a reason?

The norm of h, i.e., f(x) gets bigger and bigger as training evolves, leading to the torch.exp(h) term approaches to Infinity. I looked into the model parameters and found the norm of them also increased a lot in training. One way to mitigate the issue is by using L1 regularization. It would also help if you use F.softplus to make the logistic prior calculation stable.

Jun 19 '20 03:06 leviszhang

nice_pytorch nice_pytorch copied to clipboard

Validation Loss Statistics: min=nan, med=nan, mean=nan, max=nan

nice_pytorch
nice_pytorch copied to clipboard