visual7w-qa-models icon indicating copy to clipboard operation
visual7w-qa-models copied to clipboard

Training with visual genome dataset

Open mrfarazi opened this issue 7 years ago • 0 comments

Hello @yukezhu ,

I am running the train_telling.lua code after generating qa_data.h5 and qa_data.json with prepare_dataset.py from Visual Genome dataset (read all the images from VG_100K and VG_100K_2 folders). I am training in GPU mode with batch_size=1. I get the following error after 25th iteration. Same error in the CPU mode.

.
.
.
question 976648: where is this picture taken ? ten king doe while staying rc mason culture kitties familiar doe staying church lockers .	
evaluating validation performance... 250 (9.231342)	
validation loss: 	9.1918324186961	
wrote json checkpoint to checkpoints/model_id.json	
iter 1: 9.186291 (9.191735)	
iter 2: 9.096266 (9.191258)	
iter 3: 8.853861 (9.189571)	
iter 4: 8.986907 (9.188557)	
iter 5: 9.046180 (9.187845)	
iter 6: 8.980703 (9.186810)	
iter 7: 8.890981 (9.185331)	
iter 8: 8.588496 (9.182346)	
iter 9: 8.541151 (9.179140)	
iter 10: 8.375111 (9.175120)	
iter 11: 8.544146 (9.171965)	
iter 12: 8.388834 (9.168050)	
iter 13: 7.858945 (9.161504)	
iter 14: 7.269617 (9.152045)	
iter 15: 7.214907 (9.142359)	
iter 16: 7.827134 (9.135783)	
iter 17: 6.620107 (9.123205)	
iter 18: 7.074771 (9.112962)	
iter 19: 6.372057 (9.099258)	
iter 20: 7.658658 (9.092055)	
iter 21: 7.211345 (9.082651)	
iter 22: 5.337039 (9.063923)	
iter 23: 6.063521 (9.048921)	
iter 24: 6.835446 (9.037854)	
iter 25: 6.785455 (9.026592)	
/home/f/torch/install/bin/luajit: /home/f/torch/install/share/lua/5.1/hdf5/dataset.lua:114: attempt to perform arithmetic on a nil value
stack traceback:
	/home/f/torch/install/share/lua/5.1/hdf5/dataset.lua:114: in function 'rangesToOffsetAndCount'
	/home/f/torch/install/share/lua/5.1/hdf5/dataset.lua:136: in function 'partial'
	./misc/QADatasetLoader.lua:232: in function 'getBatch'
	train_telling.lua:177: in function 'lossFun'
	train_telling.lua:336: in main chunk
	[C]: in function 'dofile'
	...r226/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
	[C]: at 0x00405d50

I would really appreciate if someone can help me with this.

Thanks in advance.

mrfarazi avatar May 10 '17 23:05 mrfarazi