HieCoAttenVQA
HieCoAttenVQA copied to clipboard
run prepro_vqa.py error when split=2
I got this error when split=2 while split=1 work very well. the command is : python vqa_preprocess.py --download 1 --split 2 python prepro_vqa.py --input_train_json ../data/vqa_raw_train.json --input_test_json ../data/vqa_raw_test.json --num_ans 1000
the error is :
top words and their counts:9.88% done)
(320161, '?')
(225976, 'the')
(200545, 'is')
(118203, 'what')
(76624, 'are')
(64512, 'this')
(49209, 'in')
(45681, 'a')
(41629, 'on')
(40158, 'how')
(38230, 'many')
(37322, 'color')
(37023, 'of')
(29182, 'there')
(18392, 'man')
(14668, 'does')
(13492, 'people')
(12518, 'picture')
(11779, "'s")
(11758, 'to')
total words: 2284620
number of bad words: 0/14770 = 0.00%
number of words in vocab would be 14770
number of UNKs: 0/2284620 = 0.00%
inserting the special UNK token
Traceback (most recent call last):
File "prepro_vqa.py", line 292, in
there is no 'ans' key on split 2, you should modify lines that ask it.
@idansc Thank you, i fixed this code bug, and download the pretrained model from here(https://filebox.ece.vt.edu/~jiasenlu/codeRelease/co_atten/model/vqa_model/model_alternating_train-val_vgg.t7). I submitted the result with the name: vqa_OpenEnded_mscoco_test-dev2015_HieCoAttenVQA_results.json, and i got the accuracy like this: {"overall": 43.11, "perAnswerType": {"other": 15.74, "number": 29.76, "yes/no": 78.73}}
But the overall accuracy in the paper is 60.1%, i really don't know where this gap is comes from?
Did they provide the json, and h5 files as well? the model need to be aligned with the pre-processed files.
yeah, the json is provided, and the h5 file is generated by myself: th prepro_img_vgg.lua -input_json ../data/vqa_data_prepro.json -image_root /home/jiasenlu/data/ -cnn_proto ../image_model/VGG_ILSVRC_19_layers_deploy.prototxt -cnn_model ../image_model/VGG_ILSVRC_19_layers.caffemodel
that's ok for the images, but what about the h5file containing the preprocessed question dataset? if it's not provided it will cause problems on training (to be able to fit the right answers to questions) I believe the gap caused by some sort of misalignment in the pre-processing.
@idansc any idea about how to fix this misalignment problem?
did you try to training the model yourself? by the way are you running on cpu or gpu?
I have tried to use GPU(M40) to train this model, but the training process is very slow(12.5 hour / 1 epoch, 250epoch is used in the paper), and i'm trying to find out where is the bottleneck
epoch should be a few mins with M40, check that your stored cnn features are on SSD, or use DataLoader if you have enough RAM (about 60GB)
yes, it should be. I trained another model based on another github and the speed is very fast, while the accuracy is not good enough: here. I will double check the hardware.
@xhzhao , @idansc , sorry for offtopic, but really need some help here. I trained the model based on customized VQA, but not sure how to run the evaluation now. I read the readme, but it's not clear from there. I would highly appreciate any help and if you are open for discussion, I guess we can continue here.
@xhzhao I met the same problem with you, but I have no idea to deal with it. How did you solve it?
OK, I made it :)
@lupantech @yauhen-info @xhzhao How do you solved the problem? I am using the VQA v1.9 dataset, the eval.lua
provided by HieCoAttenVQA
, and the vqaEvalDemo.py
provided by VT-vision-lab/VQA
, it will report a error at vqa.py
as
'Results do not correspond to current VQA set. Either the results do not have predictions for all question ids in annotation file or there is at least one question id that does not belong to the question ids in the annotation file.'
.
I see one suggestion that to use the eval.lua
in VT-vision-lab/VQA_LSTM_CNN
, but it is not well suitable for the one in HieCoAttenVQA
. Thanks!
I find the error is due to the new question_id
in VQA dataset v1.9 exceeds the volume of FloatTensor, which will cause mismatch when copy from data.ques_id
. Simply change one line code will work, as local ques_id = torch.DoubleTensor(total_num)
@panfengli which file is the line "ques_id = torch.DoubleTensor(total_num)" in ?