error when prepro with --split 2
rzai@rzai00:~/prj/san-torch/data$ python vqa_preprocess.py split 2
usage: vqa_preprocess.py [-h] [--download DOWNLOAD] [--split SPLIT]
vqa_preprocess.py: error: unrecognized arguments: split 2
rzai@rzai00:~/prj/san-torch/data$ python vqa_preprocess.py --split 2
parsed input parameters:
{
"download": 1,
"split": 2
}
Archive: zip/Questions_Train_mscoco.zip
inflating: annotations/OpenEnded_mscoco_train2014_questions.json
inflating: annotations/MultipleChoice_mscoco_train2014_questions.json
Archive: zip/Questions_Val_mscoco.zip
inflating: annotations/OpenEnded_mscoco_val2014_questions.json
inflating: annotations/MultipleChoice_mscoco_val2014_questions.json
Archive: zip/Questions_Test_mscoco.zip
inflating: annotations/OpenEnded_mscoco_test2015_questions.json
inflating: annotations/MultipleChoice_mscoco_test2015_questions.json
inflating: annotations/OpenEnded_mscoco_test-dev2015_questions.json
inflating: annotations/MultipleChoice_mscoco_test-dev2015_questions.json
Archive: zip/Annotations_Train_mscoco.zip
inflating: annotations/mscoco_train2014_annotations.json
Archive: zip/Annotations_Val_mscoco.zip
inflating: annotations/mscoco_val2014_annotations.json
Loading annotations and questions...
Training sample 369861, Testing sample 244302...
rzai@rzai00:~/prj/san-torch/data$
rzai@rzai00:~/prj/san-torch/data$ ll total 260256 drwxrwxr-x 4 rzai rzai 4096 11月 24 16:22 ./ drwxrwxr-x 7 rzai rzai 4096 11月 23 20:22 ../ drwxrwxr-x 2 rzai rzai 4096 11月 24 16:21 annotations/ -rw-rw-r-- 1 rzai rzai 12167843 11月 23 19:33 Annotations_Train_mscoco.zip -rw-rw-r-- 1 rzai rzai 6031604 11月 23 19:33 Annotations_Val_mscoco.zip -rw-rw-r-- 1 rzai rzai 26512941 11月 23 19:33 Questions_Test_mscoco.zip -rw-rw-r-- 1 rzai rzai 21985607 11月 23 19:33 Questions_Train_mscoco.zip -rw-rw-r-- 1 rzai rzai 10594497 11月 23 19:33 Questions_Val_mscoco.zip -rw-rw-r-- 1 rzai rzai 121 11月 23 19:33 README.md -rwxrwxr-x 1 rzai rzai 5880 11月 23 19:33 vqa_preprocess.py* -rwxrwxr-x 1 rzai rzai 5873 11月 11 17:54 vqa_preprocess.py-backup* -rw-rw-r-- 1 rzai rzai 72607329 11月 24 16:22 vqa_raw_test.json -rw-rw-r-- 1 rzai rzai 116551834 11月 24 16:22 vqa_raw_train.json drwxrwxr-x 2 rzai rzai 4096 11月 23 19:33 zip/ rzai@rzai00:~/prj/san-torch/data$
rzai@rzai00:~/prj/san-torch/prepro$ python prepro_vqa.py --input_train_json ../data/vqa_raw_train.json --input_test_json ../data/vqa_raw_test.json --num_ans 1000
parsed input parameters:
{
"input_train_json": "../data/vqa_raw_train.json",
"num_ans": 1000,
"input_test_json": "../data/vqa_raw_test.json",
"word_count_threshold": 0,
"max_length": 26,
"output_h5": "../data/vqa_data_prepro.h5",
"output_json": "../data/vqa_data_prepro.json",
"token_method": "nltk"
}
top answer and their counts:
(86619, u'yes')
(54664, u'no')
(11941, u'2')
(6991, u'1')
(6756, u'white')
(6488, u'3')
(5318, u'red')
(4974, u'blue')
(3808, u'4')
(3714, u'green')
(3436, u'black')
(2785, u'yellow')
(2526, u'brown')
(2196, u'5')
(1663, u'tennis')
(1524, u'baseball')
(1516, u'right')
(1484, u'orange')
(1406, u'6')
(1390, u'left')
question number reduce from 369861 to 320029
example processed tokens:
['is', 'there', 'a', 'shadow', '?']
['is', 'this', 'one', 'bench', 'or', 'multiple', 'benches', '?']
['is', 'this', 'a', 'modern', 'train', '?']
['what', 'color', 'is', 'the', 'stripe', 'on', 'the', 'train', '?']
['what', 'is', 'on', 'the', 'other', 'side', 'of', 'the', 'train', '?']
['is', 'the', 'bus', 'driver', 'on', 'any', 'kind', 'of', 'antidepressant', 'medication', '?']
['is', 'the', 'bus', 'moving', '?']
['what', 'color', 'is', 'the', 'bus', '?']
['are', 'these', 'items', 'for', 'sale', '?']
['what', 'is', 'for', 'sale', 'under', 'this', 'tent', '?']
example processed tokens:(99.99% done)
['are', 'the', 'dogs', 'tied', '?']
['is', 'this', 'a', 'car', 'show', '?']
['is', 'there', 'a', 'lady', 'sitting', 'inside', 'the', 'red', 'truck', '?']
['is', 'the', 'man', 'surfing', '?']
['what', 'color', 'is', 'the', 'man', "'s", 'swimsuit', '?']
['is', 'the', 'man', 'surfing', '?']
['what', 'does', 'the', 'tail', 'of', 'the', 'plane', 'say', '?']
['is', 'the', 'plane', 'gaining', 'altitude', '?']
['is', 'this', 'a', 'boeing', 'jet', '?']
['how', 'deep', 'do', 'you', 'think', 'the', 'snow', 'is', '?']
top words and their counts:9.88% done)
(320161, '?')
(225976, 'the')
(200545, 'is')
(118203, 'what')
(76624, 'are')
(64512, 'this')
(49209, 'in')
(45681, 'a')
(41629, 'on')
(40158, 'how')
(38230, 'many')
(37322, 'color')
(37023, 'of')
(29182, 'there')
(18392, 'man')
(14668, 'does')
(13492, 'people')
(12518, 'picture')
(11779, "'s")
(11758, 'to')
total words: 2284620
number of bad words: 0/14770 = 0.00%
number of words in vocab would be 14770
number of UNKs: 0/2284620 = 0.00%
inserting the special UNK token
Traceback (most recent call last):
File "prepro_vqa.py", line 292, in