neural-vqa icon indicating copy to clipboard operation
neural-vqa copied to clipboard

Error while evaluating through pretrained checkpoint

Open aekanshkansal1 opened this issue 7 years ago • 23 comments

I am trying to get the results through pretrained cpu checkpoint. My command is

th predict.lua -checkpoint_file checkpoints/vqa_epoch23.26_0.4610_cpu.t7 -input_image_path data/train2014/COCO_train2014_000000405541.jpg -question 'What is the cat on?' -gpuid -1

Error given is Loading data files... /home/aekansh/torch/install/bin/lua: ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:259: read error: read 0 blocks instead of 1 at /home/aekansh/torch/pkg/torch/lib/TH/THDiskFile.c:349 stack traceback: [C]: in function 'readInt' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:259: in function 'readObject' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:368: in function 'readObject' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load' ./utils/DataLoader.lua:47: in function 'create' predict.lua:59: in main chunk [C]: in function 'dofile' .../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: ?

Even if I donot use -gpuid parameter it gives the same error

aekanshkansal1 avatar Mar 29 '17 08:03 aekanshkansal1

Do you have data.t7, answers_vocab.t7, and questions_vocab.t7 in the data/ folder and the model checkpoint in the checkpoints/ folder? (Download links given here).

abhshkdz avatar Mar 29 '17 17:03 abhshkdz

Yes I have all of these in correct folders

aekanshkansal1 avatar Mar 30 '17 08:03 aekanshkansal1

That's odd. It seems like a path issue. Line 47 is local data = torch.load(tensor_file). If you th and torch.load('data/data.t7'), do you get the same error?

abhshkdz avatar Mar 30 '17 08:03 abhshkdz

If I th and torch.load('data/data.t7') I get error as

...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:259: read error: read 0 blocks instead of 1 at /home/aekansh/torch/pkg/torch/lib/TH/THDiskFile.c:349 stack traceback: ...e/aekansh/torch/install/share/lua/5.1/trepl/init.lua:506: in function <...e/aekansh/torch/install/share/lua/5.1/trepl/init.lua:499> [C]: in function 'readInt' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:259: in function 'readObject' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:368: in function 'readObject' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load' [string "_RESULT={torch.load('data/data.t7')}"]:1: in main chunk [C]: in function 'xpcall' ...e/aekansh/torch/install/share/lua/5.1/trepl/init.lua:661: in function 'repl' .../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:204: in main chunk [C]: ?

aekanshkansal1 avatar Mar 30 '17 09:03 aekanshkansal1

Thanks. Same error. How about torch.load('data/data.t7', 'binary')?

abhshkdz avatar Mar 30 '17 09:03 abhshkdz

Same error still

aekanshkansal1 avatar Mar 30 '17 09:03 aekanshkansal1

..e/aekansh/torch/install/share/lua/5.1/torch/File.lua:259: read error: read 0 blocks instead of 1 at /home/aekansh/torch/pkg/torch/lib/TH/THDiskFile.c:349 stack traceback: ...e/aekansh/torch/install/share/lua/5.1/trepl/init.lua:506: in function <...e/aekansh/torch/install/share/lua/5.1/trepl/init.lua:499> [C]: in function 'readInt' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:259: in function 'readObject' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:368: in function 'readObject' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load' [string "_RESULT={torch.load('data/data.t7','binary'..."]:1: in main chunk [C]: in function 'xpcall' ...e/aekansh/torch/install/share/lua/5.1/trepl/init.lua:661: in function 'repl' .../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:204: in main chunk [C]: ?

aekanshkansal1 avatar Mar 30 '17 09:03 aekanshkansal1

Thanks. Seems like an architecture issue. Could you try downloading data.ascii.t7 from here. And then try torch.load('data/data.ascii.t7', 'ascii').

abhshkdz avatar Mar 30 '17 09:03 abhshkdz

After using load('data/data.ascii.t7', 'ascii')

Output given is

...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:375: unknown object stack traceback: ...e/aekansh/torch/install/share/lua/5.1/trepl/init.lua:506: in function <...e/aekansh/torch/install/share/lua/5.1/trepl/init.lua:499> [C]: in function 'error' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:375: in function 'readObject' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load' [string "_RESULT={torch.load('data/data.ascii.t7')}"]:1: in main chunk [C]: in function 'xpcall' ...e/aekansh/torch/install/share/lua/5.1/trepl/init.lua:661: in function 'repl' .../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:204: in main chunk [C]: ?

aekanshkansal1 avatar Mar 30 '17 10:03 aekanshkansal1

Strange! The documentation suggests ascii should be readable. Not sure what the issue is. You might want to try asking on the Torch group: https://groups.google.com/forum/#!forum/torch7.

abhshkdz avatar Mar 30 '17 10:03 abhshkdz

OK thanks

aekanshkansal1 avatar Mar 30 '17 10:03 aekanshkansal1

@abhshkdz I am not getting this error on th then torch.load('data/data.t7'). Instead, am getting this type of output (am displaying just one)-

121512 :
        {
          answer : 997
          image_id : 552610
          question : ShortTensor - size: 23
        }

I guess this works fine. But, on using the predict.lua as - th predict.lua -checkpoint_file checkpoints/vqa_epoch23.26_0.4610_cpu.t7 -input_image_path data/train2014/COCO_train2014_000000405543.jpg -question 'What is in the plate' am getting the following stacktrace -

Loading data files...
loading checkpoint from checkpoints/vqa_epoch23.26_0.4610_cpu.t7
Warning: Failed to load function from bytecode: (binary): cannot load incompatible bytecodeWarning: Failed to load function from bytecode: [string "..."]:1: unexpected symbol near 'char(8)'/Users/WARL0CK/torch/install/bin/luajit: /Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:375: unknown object
stack traceback:
	[C]: in function 'error'
	/Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:375: in function 'readObject'
	/Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:307: in function 'readObject'
	/Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
	/Users/WARL0CK/torch/install/share/lua/5.1/nn/Module.lua:192: in function 'read'
	/Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
	/Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
	/Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
	/Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:353: in function 'readObject'
	/Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
	/Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
	...
	/Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
	.../WARL0CK/torch/install/share/lua/5.1/nngraph/gmodule.lua:495: in function 'read'
	/Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
	/Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
	/Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
	/Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load'
	predict.lua:64: in main chunk
	[C]: in function 'dofile'
	...L0CK/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
	[C]: at 0x01045bb350

What @aekanshkansal1 is facing is ARM architecture issue I guess, but ascii should work nicely.

yadavankit avatar Mar 30 '17 10:03 yadavankit

Hey @yadavankit, data.t7 looks fine. The error looks like you're using luajit, the checkpoint was created using lua5.1 (ref).

abhshkdz avatar Mar 30 '17 10:03 abhshkdz

@abhshkdz thanks 👍 will clean and install 5.1 right away

yadavankit avatar Mar 30 '17 10:03 yadavankit

Now, am getting error on evaluating, am already having the latest protobuf 3.2.0 installed -

[libprotobuf` INFO google/protobuf/io/coded_stream.cc:610] Reading dangerously large protocol message.  If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 574671192
Successfully loaded models/VGG_ILSVRC_19_layers.caffemodel
lua(31231,0x7fffcec693c0) malloc: *** error for object 0x7fbb84e274e0: incorrect checksum for freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug
[1]    31231 abort      th predict.lua -checkpoint_file checkpoints/vqa_epoch23.26_0.4610_cpu.t7

yadavankit avatar Mar 30 '17 11:03 yadavankit

Memory issues?

abhshkdz avatar Mar 30 '17 11:03 abhshkdz

Should I try increasing the limit as below? CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h What would you recommend doing?

yadavankit avatar Mar 30 '17 11:03 yadavankit

Don't think it's that. Read bytes seem to be lower than the limit. Does this error show up in every run? Could you monitor system memory and see if things aren't going over? You could also comment everything after the loadcaffe.load(...) line and gradually uncomment things to pinpoint what's causing the issue.

abhshkdz avatar Mar 30 '17 12:03 abhshkdz

No, every now and then this error occurs too -

Loading data files...
loading checkpoint from checkpoints/vqa_epoch23.26_0.4610_cpu.t7
[libprotobuf INFO google/protobuf/io/coded_stream.cc:610] Reading dangerously large protocol message.  If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 574671192
Successfully loaded models/VGG_ILSVRC_19_layers.caffemodel
[1]    33360 bus error  th predict.lua -checkpoint_file checkpoints/vqa_epoch23.26_0.4610_cpu.t7

yadavankit avatar Mar 30 '17 13:03 yadavankit

I don't think system memory will be the problem here, screen shot 2017-03-30 at 6 42 17 pm

yadavankit avatar Mar 30 '17 13:03 yadavankit

Am now getting this too -

lua(33505,0x7fffcec693c0) malloc: *** error for object 0x7faf1dee6210: incorrect checksum for freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug
[1]    33505 abort      th predict.lua -checkpoint_file checkpoints/vqa_epoch23.26_0.4610_cpu.t7

yadavankit avatar Mar 30 '17 13:03 yadavankit

hi @yadavankit ,I get the same error, how you fixed it? thank you

"Loading data files... loading checkpoint from checkpoints/vqa_epoch23.26_0.4610_cpu.t7 Warning: Failed to load function from bytecode: (binary): cannot load incompatible bytecodeWarning: Failed to load function from bytecode: [string "..."]:1: unexpected symbol near 'char(8)'/Users/WARL0CK/torch/install/bin/luajit: /Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:375: unknown object stack traceback:"

zhimeng9 avatar Sep 08 '17 08:09 zhimeng9

@zhimeng9 which version of Lua are you using?

yadavankit avatar Sep 10 '17 11:09 yadavankit