anago
anago copied to clipboard
nan loss in elmo_example.py
First of all thanks for this great library!
System information
- **OS Platform and Distribution MacOS 10.13.5 and CentOS 7
- TensorFlow/Keras version: 1.11.0, 2.2.4
- Python version: 3.6.5
Describe the problem
I tried running the elmo_example.py
with standard settings and on the CoNLL03 data.
After the first or second batch of the first epoch, I immediately run into a nan
loss. When trying to reduce the learning rate, this problem gets just postponed to a later batch.
So I'm curious if there is some important parameter missing in the provided training script
Source code / logs
Just the source of elmo_example.py
.
Using TensorFlow backend.
Loading dataset...
Transforming datasets...
Loading word embeddings...
Building a model.
Training the model...
Epoch 1/1
1/548 [..............................] - ETA: 1:07:43 - loss: 7.2010
2/548 [..............................] - ETA: 34:25 - loss: 6.8820
3/548 [..............................] - ETA: 23:47 - loss: 10.1237
4/548 [..............................] - ETA: 25:59 - loss: 11.2332
5/548 [..............................] - ETA: 29:44 - loss: nan
6/548 [..............................] - ETA: 31:40 - loss: nan
Hi, I ran into something similar while training a anago.Sequence
model on my own data: after training processed a few batches the loss would report as nan
.
In my case it was caused by the presence of empty sequences of tokens in the training data. Filtering them out did the trick.
Hope this helps.
I have removed the empty sequences of tokens. It still doesn't help. Can it occur due to very high number of 'o-tags' instead of just the 'B' and 'I' tags?