ba-dls-deepspeech Loss becomes nan after a while

screen shot 2016-12-03 at 8 02 47 am

Why is this happening? And how to solve it?

Dec 03 '16 02:12 singlasahil14

Try using the Keras optimizer rather than Lasagne's.

Dec 04 '16 19:12 Ka-ya

What would be the exact changes in the code?

Dec 04 '16 22:12 singlasahil14

Same here, I have edited model.py : Comment import lasagne Uncomment from keras.optimizers import SGD Comment grads = lasagne.updates.total_norm_constraint... Comment updates = lasagne.updates.nesterov_momentum... Uncomment optimizer = SGD(nesterov=True, lr=learning_rate,... Uncomment updates = optimizer.get_updates(...

Dec 05 '16 00:12 aglotero

I'm seeing this issue too. I switched to using the Keras optimizer instead of Lasagne's, making the same changes that @aglotero cited above.

For the first 8990 (out of 12188) iterations the loss function was working properly. Then it looks like starting at iteration 9000 I started seeing the nan

...
2016-12-07 04:38:33,080 INFO    (__main__) Epoch: 0, Iteration: 8960, Loss: 148.405151367
2016-12-07 04:40:38,369 INFO    (__main__) Epoch: 0, Iteration: 8970, Loss: 356.538299561
2016-12-07 04:42:43,709 INFO    (__main__) Epoch: 0, Iteration: 8980, Loss: 382.034057617
2016-12-07 04:44:49,189 INFO    (__main__) Epoch: 0, Iteration: 8990, Loss: 310.213592529
2016-12-07 04:58:47,111 INFO    (__main__) Epoch: 0, Iteration: 9000, Loss: nan

Interestingly, the loss spiked at iteration 8960. Here is the plot for the first 9000 iterations.

plot4

Some notes: I am using dropout on the RNN layers hence the plot, and I increased the data size being trained on by upping the max duration to 15.0 seconds. My mini batch size is 24.

Dec 07 '16 05:12 dylanbfox

Using the SGM optimizer with the clipnorm=1 option may be a solution. I was gettin a nan cost at 400th iteration, now I'm at the 3690th iteration, and running. plot_modelo8 I saw some similar issue at https://github.com/fchollet/keras/issues/1244

Dec 09 '16 17:12 aglotero

FWIW I fixed this by dropping the learning rate and removing the dropout layers I added. I left the clipnorm value at 100.

Dec 19 '16 17:12 dylanbfox

Hi! I change the clipnorm to 1 like @aglotero said, but with more gru layers(1 convolutional, 7 gru with 1000 nodes each, 1 full connection). plot_7rnn I found that the loss was converging but stuck at about 300, and the visualize result for testing is really bad! Is that mean the structure is not deep enough? Or I should train more epochs? Thanks!

Dec 23 '16 05:12 a00achild1

@a00achild1 I don't think you want clipnorm set to 1. Were you getting NaNs before with the clipnorm set to a higher val (~100)?

Dec 23 '16 06:12 dylanbfox

@dylanbfox thanks for quick response! When the clipnorm is the default value(100), i get NaNs after some iterations. Then I came to read this issue, and try to train the model with clipnorm 1.

What do you mean you don't think setting clipnorm to 1 is a good idea? Is the performance affected by the small clipnorm value?

Dec 23 '16 07:12 a00achild1

What is your learning rate? Trying dropping that and keeping the clipnorm higher.

Dec 23 '16 08:12 dylanbfox

@dylanbfox my learning rate is 2e-4, the default value. In my experience this is really small. Maybe I am wrong. I will try smaller value and keeping the clipnorm higher. Thanks

Dec 23 '16 08:12 a00achild1

I set the learning rate to 2e-4, clipnorm back to 100, training with LibriSpeech-clean-100, and my model structure is 1 convolutional, 7 gru with 1000 nodes each, 1 full connection, which is reference to Baidu's paper.

While the training loss seemed dropping down continuously, the validation loss started to diverge. The prediction of a testing file is better than before, but it still can't predict a correct vocabulary. Did anyone has trained a good model for speech recognition or has some suggestion? Any suggestion will be really appreciated!

P.S. Could the problem still on the clipnorm? I've been searching for a while, but it seems there isn't a good approach to determine the clip value. plot_7rnn

Dec 26 '16 10:12 a00achild1

I have a similar problem as the other threads described. But my model has NaN value after 1st iteration. I tried chaging an optimizer function (Keras and Lasagne), a clipnorm (1 and 100), and a Learning rate (2e-4 and 0.01), but it still has NaN cost function value. Is there anyone who can advise about this problem? I would really appreciate if you guys give a solution for this. I used Keras-1.0.7 and Theano-rel-0.8.2. If you think this version is not appropriate, please let me know.

Ex. Keras , learning_rate=2e-4, clipnorm = 1 2017-01-09 01:27:52,611 INFO (main) Epoch: 0, Iteration: 0, Loss: 241.261184692 2017-01-09 01:28:00,360 INFO (main) Epoch: 0, Iteration: 1, Loss: nan 2017-01-09 01:28:07,864 INFO (main) Epoch: 0, Iteration: 2, Loss: nan 2017-01-09 01:28:15,374 INFO (main) Epoch: 0, Iteration: 3, Loss: nan 2017-01-09 01:28:23,191 INFO (main) Epoch: 0, Iteration: 4, Loss: nan 2017-01-09 01:28:31,301 INFO (main) Epoch: 0, Iteration: 5, Loss: nan 2017-01-09 01:28:39,587 INFO (main) Epoch: 0, Iteration: 6, Loss: nan 2017-01-09 01:28:48,127 INFO (main) Epoch: 0, Iteration: 7, Loss: nan 2017-01-09 01:28:56,824 INFO (main) Epoch: 0, Iteration: 8, Loss: nan 2017-01-09 01:29:05,442 INFO (main) Epoch: 0, Iteration: 9, Loss: nan 2017-01-09 01:29:14,783 INFO (main) Epoch: 0, Iteration: 10, Loss: nan 2017-01-09 01:29:23,937 INFO (main) Epoch: 0, Iteration: 11, Loss: nan

Jan 09 '17 01:01 kisuksung

I just found a solution! We should use Keras-1.1.0 or above version for Keras package.

Jan 09 '17 04:01 kisuksung

@a00achild1 Hey. Did you find out why your graph turns into that? I'm currently at that stage.

Jun 11 '17 18:06 varunravi

ba-dls-deepspeech ba-dls-deepspeech copied to clipboard

Loss becomes nan after a while

ba-dls-deepspeech
ba-dls-deepspeech copied to clipboard