chainer-caption Loss Function is nan while training after Preprocessed the dataset

Using the command in personal_notes.txt, and preprocessed the data, I try to train the model but only the first epoch has a loss function of 7.xxx, but then everything is just 0.

Nov 30 '18 22:11 sandy2008

I don't recommend using the command at personal_notes.txt because it's really a personal note :) . The command in Readme should be better.

In general, the Nan loss means learning rate is too high, so you could try decreasing the learning rate.

Dec 01 '18 23:12 apple2373

Thank you so much for your reply! I tried to train it with the Readme.txt file, but captions ./data/MSCOCO/MSCOCO/mscoco_caption_train2014_processed.json file is not downloaded by the shell command file.

Dec 01 '18 23:12 sandy2008

Ops! sorry, you are the first one to ask mscoco_caption_train2014_processed.json !

I uploaded here: https://www.dropbox.com/s/1a7cgetjyb8h8ho/mscoco_caption_train2014_processed.json?dl=0

Dec 02 '18 02:12 apple2373

Thank you so much for your reply! Btw, is it the right script to generate dic and preprossed data for Japanese?


python train_caption_model.py --savedir ./experiment1jp_yj --epoch 40 --batch 120 --gpu 0 \
--vocab ./data/MSCOCO/yjcaptions26k_clean_processed_dic.json \
--captions ./data/MSCOCO/yjcaptions26k_clean_processed.json \
--preload True

Dec 02 '18 04:12 sandy2008

Let me download it and test it again ;)

Dec 02 '18 04:12 sandy2008

I don't think your command is for generating preprocessed data. The command looks fine to me but it's a training command.

Dec 02 '18 21:12 apple2373

Sorry I copied and pasted the wrong one!!!

python preprocess_MSCOCO_captions.py \
--input ../data/MSCOCO/yjcaptions26k_clean.json \
--output ../data/MSCOCO/yjcaptions26k_clean_processed.json \
--outdic ../data/MSCOCO/yjcaptions26k_clean_processed_dic.json \
--outfreq ../data/MSCOCO/yjcaptions26k_clean_processed_freq.json \
--cut 0 \
--char True \

Dec 03 '18 04:12 sandy2008

yes this is correct.

Dec 04 '18 21:12 apple2373

And the learning rate is nan if I run this command then train the model ;) Do you have any idea why this would happen? I will try to test it with other things..

Dec 05 '18 00:12 sandy2008

I don't think learning rate can be nan. Well, when i do the training, the loss will decrease around 3 quickly.

Did you modify any of the code? You could clone the repository again and try from scratch. Also remember to use the same chainer version 1.19.0 exactly.

Dec 05 '18 00:12 apple2373

Cool, i will do it later, and there is now another dataset for MSCOCO in Japanese: https://stair.center/archives/338, I will test it with the old dataset and the new dataset ;)

Dec 05 '18 00:12 sandy2008

chainer-caption chainer-caption copied to clipboard

Loss Function is nan while training after Preprocessed the dataset

chainer-caption
chainer-caption copied to clipboard