chainer-caption
chainer-caption copied to clipboard
Loss Function is nan while training after Preprocessed the dataset
Using the command in personal_notes.txt, and preprocessed the data, I try to train the model but only the first epoch has a loss function of 7.xxx, but then everything is just 0.
I don't recommend using the command at personal_notes.txt because it's really a personal note :) . The command in Readme should be better.
In general, the Nan loss means learning rate is too high, so you could try decreasing the learning rate.
Thank you so much for your reply! I tried to train it with the Readme.txt file, but captions ./data/MSCOCO/MSCOCO/mscoco_caption_train2014_processed.json file is not downloaded by the shell command file.
Ops! sorry, you are the first one to ask mscoco_caption_train2014_processed.json !
I uploaded here: https://www.dropbox.com/s/1a7cgetjyb8h8ho/mscoco_caption_train2014_processed.json?dl=0
Thank you so much for your reply! Btw, is it the right script to generate dic and preprossed data for Japanese?
python train_caption_model.py --savedir ./experiment1jp_yj --epoch 40 --batch 120 --gpu 0 \
--vocab ./data/MSCOCO/yjcaptions26k_clean_processed_dic.json \
--captions ./data/MSCOCO/yjcaptions26k_clean_processed.json \
--preload True
Let me download it and test it again ;)
I don't think your command is for generating preprocessed data. The command looks fine to me but it's a training command.
Sorry I copied and pasted the wrong one!!!
python preprocess_MSCOCO_captions.py \
--input ../data/MSCOCO/yjcaptions26k_clean.json \
--output ../data/MSCOCO/yjcaptions26k_clean_processed.json \
--outdic ../data/MSCOCO/yjcaptions26k_clean_processed_dic.json \
--outfreq ../data/MSCOCO/yjcaptions26k_clean_processed_freq.json \
--cut 0 \
--char True \
yes this is correct.
And the learning rate is nan if I run this command then train the model ;) Do you have any idea why this would happen? I will try to test it with other things..
I don't think learning rate can be nan. Well, when i do the training, the loss will decrease around 3 quickly.
Did you modify any of the code? You could clone the repository again and try from scratch. Also remember to use the same chainer version 1.19.0 exactly.
Cool, i will do it later, and there is now another dataset for MSCOCO in Japanese: https://stair.center/archives/338, I will test it with the old dataset and the new dataset ;)