mx-lsoftmax icon indicating copy to clipboard operation
mx-lsoftmax copied to clipboard

nan value

Open auroua opened this issue 7 years ago • 6 comments

I reimplemented in tensorflow, but find it is hard to train. It is very easy get nan value. How could to avoid this?

auroua avatar May 09 '17 02:05 auroua

The output loss value:

step 3600, training loss 1.00463
step 3700, training loss 0.970242
step 3800, training loss 0.906492
step 3900, training loss 0.0988686
step 4000, training loss 0.00080747
step 4100, training loss 0.000444604
step 4200, training loss 0.000204534
step 4300, training loss 0.000227041
step 4400, training loss 0.000149651
step 4500, training loss 0.000162705
step 4600, training loss 9.66944e-05
step 4700, training loss 8.69876e-05
step 4800, training loss 6.04607e-05
step 4900, training loss 8.24705e-05
step 5000, training loss 6.02255e-05
step 5100, training loss 4.36621e-05
step 5200, training loss 3.89259e-05
.....
step 18500, training loss 3.72529e-09
step 18600, training loss 7.45058e-09
step 18700, training loss 2.79397e-09
step 18800, training loss 5.58794e-09
step 18900, training loss 3.72529e-09
step 19000, training loss 9.31323e-10
step 19100, training loss 1.86265e-09
step 19200, training loss 2.79397e-09
step 19300, training loss 1.86265e-09
step 19400, training loss 4.65661e-09
step 19500, training loss 2.79397e-09
step 19600, training loss 5.58794e-09
step 19700, training loss 9.31323e-10
step 19800, training loss 1.86265e-09
step 19900, training loss 9.31323e-10
step 20000, training loss -0

What's the meaning of -0 in loss function

auroua avatar May 09 '17 06:05 auroua

I have no experience with TensorFlow. Did you train the model on MNIST or other dataset. It's weird to see loss value to be that low, how's the test loss? And for -0, sorry, I have no idea. Maybe it's related to the implement of SoftmaxLoss in TensorFlow.

For training with LSoftmax. You may refer to the advice given by the author here.

luoyetx avatar May 09 '17 10:05 luoyetx

Thanks! I know nothing about mxnet. I want to test the mxnet code. When I run the program. I got the following output AttributeError: module 'mxnet.symbol' has no attribute 'LSoftmax' How to solve it? I only install mxnet python version by pip. Should I install the c++ version?

auroua avatar May 09 '17 15:05 auroua

I found it is really hard to convergence. I compared my output cos_m_t and other parameters. We have same output, but my model can't convergence. I changed the m value to 1 in which case is the same as original softmax it can convergence, but when I changed the m value to 2, it really hard to convergence.

auroua avatar May 10 '17 12:05 auroua

@auroua Did you find a solution?I also have this problem。

dongliangchang avatar Jun 30 '17 12:06 dongliangchang

@luoyetx @auroua @DL-Chang I also have this problem. How to solve this problem?

zeroQiaoba avatar Jul 14 '18 10:07 zeroQiaoba