NER-using-Deep-Learning icon indicating copy to clipboard operation
NER-using-Deep-Learning copied to clipboard

loss:nan from training starts

Open jenniferzhu opened this issue 7 years ago • 12 comments

For the English data sets, loss is nan since training starts. I tried to match the Keras and tensorflow versions (except that I use CPU), but it doesn't help. I also tried different datasets or optimizer, none of them helps. Can you please share some thoughts on troubleshooting?

FYI, every output looks the same before m.train(epochs=10) in the "NER-using-Deep-Learning" notebook.

jenniferzhu avatar Aug 21 '17 01:08 jenniferzhu

I hope you have all spacy data downloaded (for getting vectors).

Different versions can surely be a problem. But I also faced a similar problem while I was working on this. Actually, value of dropout was creating problems. Try replacing m.make_and_compile() by m.make_and_compile(units=100, dropout=0.0, regul_alpha=0.0001) in the main ipython notebook. Also, for better results, if you have enough memory, you can increase word vector dim to 300. Change value of self.LEN_WORD_VECTORS(process_data.py) to 300.

pandeydivesh15 avatar Aug 21 '17 05:08 pandeydivesh15

Thanks for the prompt reply, Divesh! Do you mind checking which version of spacy library you are using? I did notice that my spacy gave slightly different results in the spacy tests, but no errors were reported. And thank you again for the suggestions on Keras codes. I will try that and see what happens.

jenniferzhu avatar Aug 21 '17 06:08 jenniferzhu

You’re welcome @jenniferzhu I am using spacy==1.6.0 currently.

pandeydivesh15 avatar Aug 21 '17 06:08 pandeydivesh15

Hi Divesh, I modified the codes as you suggested. It does solve the "loss: nan" issue, but it will predict all tokens to the category "O". I guess dropout helps with the imbalanced dataset. Is there some other ways you have done to balance the dateset?

jenniferzhu avatar Aug 27 '17 18:08 jenniferzhu

Are versions of keras and tensorflow matching with Keras==1.2.1 and tensorflow==0.12.1?

pandeydivesh15 avatar Aug 27 '17 19:08 pandeydivesh15

You can try changing regul_alpha while keeping dropout 0. Also, if you have sufficient memory, try changing len of vectors to 300. It will boost up results. Regarding imbalanced dataset, try to prune it using some conditions. For e.g. maximum length of sentence < threshold, atleast one entity in every sentence, etc.

pandeydivesh15 avatar Aug 27 '17 19:08 pandeydivesh15

These are great tips! I got similar results to you by reinstalling spacy and using your parameters! Thank you, Divesh! BTW, do you have any recommended webpages so that I can read and understand why and how to tune those parameters?

jenniferzhu avatar Aug 27 '17 20:08 jenniferzhu

Currently, I don't have any specific webpages(for NER) but if you like, you can go through these links. how_to_choose_a_neural_network's_hyper-parameters distill.pub

pandeydivesh15 avatar Aug 28 '17 03:08 pandeydivesh15

This is great! Thanks Divesh. You’re an expert!

On Aug 27, 2017, at 8:54 PM, Divesh Pandey [email protected] wrote:

Currently, I don't have any specific webpages(for NER) but if you like, you can go through these links. how_to_choose_a_neural_network's_hyper-parameters https://urldefense.proofpoint.com/v2/url?u=http-3A__neuralnetworksanddeeplearning.com_chap3.html-23how-5Fto-5Fchoose-5Fa-5Fneural-5Fnetwork-27s-5Fhyper-2Dparameters&d=DwMFaQ&c=qgVugHHq3rzouXkEXdxBNQ&r=FFMRLxSvLRFG9Y4FCNeKxw&m=hYXYe7gdQV0Vx_4hdJLud6vI7VQIATAg1Il43MQKkcQ&s=0968PGxAn_o3qaSY4gAOUc-KCsOVN0E7twnK9M6XH1I&e= distill.pub https://urldefense.proofpoint.com/v2/url?u=https-3A__distill.pub_&d=DwMFaQ&c=qgVugHHq3rzouXkEXdxBNQ&r=FFMRLxSvLRFG9Y4FCNeKxw&m=hYXYe7gdQV0Vx_4hdJLud6vI7VQIATAg1Il43MQKkcQ&s=G9Deowz5mUn_le8hwLV3B8O-UlgePDjTu_N6rAFOWBw&e= — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_pandeydivesh15_NER-2Dusing-2DDeep-2DLearning_issues_2-23issuecomment-2D325254200&d=DwMFaQ&c=qgVugHHq3rzouXkEXdxBNQ&r=FFMRLxSvLRFG9Y4FCNeKxw&m=hYXYe7gdQV0Vx_4hdJLud6vI7VQIATAg1Il43MQKkcQ&s=434H8oE3xbj_cQbYrpz4M0Mjx4e7DYa233-LQs0bz14&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ATRquRodFGYOHBCbXAvJkqCYEUejKDjTks5scjnngaJpZM4O8zhf&d=DwMFaQ&c=qgVugHHq3rzouXkEXdxBNQ&r=FFMRLxSvLRFG9Y4FCNeKxw&m=hYXYe7gdQV0Vx_4hdJLud6vI7VQIATAg1Il43MQKkcQ&s=H4436GeOhBeao4OYNkoV4-WXQsnQzkX6q0IH0o5sM1E&e=.

jenniferzhu avatar Aug 28 '17 03:08 jenniferzhu

definitely not an expert :smile:

pandeydivesh15 avatar Aug 28 '17 06:08 pandeydivesh15

@pandeydivesh15 You suggestions worked great for the English dataset, but the same issue occurred again when I switched datasets. I am trying to understand what your logics is when you tuned the model, in order to avoid the same prediction "O" for all cases. I tried to grid search with learning rates, but it did not help. Can you please share your logics to tune the model?

jenniferzhu avatar Sep 06 '17 00:09 jenniferzhu

Sorry for late replying. I had no specific logic while tuning model. The arguments to m.make_and_compile() played an important factor. The most important one being dropout. In my case, setting dropout helped me in Hindi dataset while it was failing for English.

pandeydivesh15 avatar Sep 13 '17 13:09 pandeydivesh15