chainer-caption icon indicating copy to clipboard operation
chainer-caption copied to clipboard

Loss Function is nan while training after Preprocessed the dataset

Open sandy2008 opened this issue 6 years ago • 11 comments

Using the command in personal_notes.txt, and preprocessed the data, I try to train the model but only the first epoch has a loss function of 7.xxx, but then everything is just 0.

sandy2008 avatar Nov 30 '18 22:11 sandy2008

I don't recommend using the command at personal_notes.txt because it's really a personal note :) . The command in Readme should be better.

In general, the Nan loss means learning rate is too high, so you could try decreasing the learning rate.

apple2373 avatar Dec 01 '18 23:12 apple2373

Thank you so much for your reply! I tried to train it with the Readme.txt file, but captions ./data/MSCOCO/MSCOCO/mscoco_caption_train2014_processed.json file is not downloaded by the shell command file.

sandy2008 avatar Dec 01 '18 23:12 sandy2008

Ops! sorry, you are the first one to ask mscoco_caption_train2014_processed.json !

I uploaded here: https://www.dropbox.com/s/1a7cgetjyb8h8ho/mscoco_caption_train2014_processed.json?dl=0

apple2373 avatar Dec 02 '18 02:12 apple2373

Thank you so much for your reply! Btw, is it the right script to generate dic and preprossed data for Japanese?


python train_caption_model.py --savedir ./experiment1jp_yj --epoch 40 --batch 120 --gpu 0 \
--vocab ./data/MSCOCO/yjcaptions26k_clean_processed_dic.json \
--captions ./data/MSCOCO/yjcaptions26k_clean_processed.json \
--preload True

sandy2008 avatar Dec 02 '18 04:12 sandy2008

Let me download it and test it again ;)

sandy2008 avatar Dec 02 '18 04:12 sandy2008

I don't think your command is for generating preprocessed data. The command looks fine to me but it's a training command.

apple2373 avatar Dec 02 '18 21:12 apple2373

Sorry I copied and pasted the wrong one!!!

python preprocess_MSCOCO_captions.py \
--input ../data/MSCOCO/yjcaptions26k_clean.json \
--output ../data/MSCOCO/yjcaptions26k_clean_processed.json \
--outdic ../data/MSCOCO/yjcaptions26k_clean_processed_dic.json \
--outfreq ../data/MSCOCO/yjcaptions26k_clean_processed_freq.json \
--cut 0 \
--char True \

sandy2008 avatar Dec 03 '18 04:12 sandy2008

yes this is correct.

apple2373 avatar Dec 04 '18 21:12 apple2373

And the learning rate is nan if I run this command then train the model ;) Do you have any idea why this would happen? I will try to test it with other things..

sandy2008 avatar Dec 05 '18 00:12 sandy2008

I don't think learning rate can be nan. Well, when i do the training, the loss will decrease around 3 quickly.

Did you modify any of the code? You could clone the repository again and try from scratch. Also remember to use the same chainer version 1.19.0 exactly.

apple2373 avatar Dec 05 '18 00:12 apple2373

Cool, i will do it later, and there is now another dataset for MSCOCO in Japanese: https://stair.center/archives/338, I will test it with the old dataset and the new dataset ;)

sandy2008 avatar Dec 05 '18 00:12 sandy2008