anago icon indicating copy to clipboard operation
anago copied to clipboard

nan loss in elmo_example.py

Open fsonntag opened this issue 6 years ago • 2 comments

First of all thanks for this great library!

System information

  • **OS Platform and Distribution MacOS 10.13.5 and CentOS 7
  • TensorFlow/Keras version: 1.11.0, 2.2.4
  • Python version: 3.6.5

Describe the problem

I tried running the elmo_example.py with standard settings and on the CoNLL03 data. After the first or second batch of the first epoch, I immediately run into a nan loss. When trying to reduce the learning rate, this problem gets just postponed to a later batch. So I'm curious if there is some important parameter missing in the provided training script

Source code / logs

Just the source of elmo_example.py.

Using TensorFlow backend.
Loading dataset...
Transforming datasets...
Loading word embeddings...
Building a model.
Training the model...
Epoch 1/1

  1/548 [..............................] - ETA: 1:07:43 - loss: 7.2010
  2/548 [..............................] - ETA: 34:25 - loss: 6.8820  
  3/548 [..............................] - ETA: 23:47 - loss: 10.1237
  4/548 [..............................] - ETA: 25:59 - loss: 11.2332
  5/548 [..............................] - ETA: 29:44 - loss: nan    
  6/548 [..............................] - ETA: 31:40 - loss: nan

fsonntag avatar Oct 08 '18 11:10 fsonntag

Hi, I ran into something similar while training a anago.Sequence model on my own data: after training processed a few batches the loss would report as nan.

In my case it was caused by the presence of empty sequences of tokens in the training data. Filtering them out did the trick.

Hope this helps.

bohana avatar Oct 23 '18 16:10 bohana

I have removed the empty sequences of tokens. It still doesn't help. Can it occur due to very high number of 'o-tags' instead of just the 'B' and 'I' tags?

amarnamarpan avatar Apr 04 '19 09:04 amarnamarpan