relativeCameraPose icon indicating copy to clipboard operation
relativeCameraPose copied to clipboard

Problem with torch load pre-trained model

Open SHENG-KAI-HUANG opened this issue 6 years ago • 5 comments

Hi, I was trying to using the pre-trained model which download from this repository. but I met the problem as following:

==> loading model from pretained weights from file: ./pre-trained/siam_hybridnet_fullsized.t7 Warning: Failed to load function from bytecode: binary string: not a precompiled chunkWarning: Failed to load function from bytecode: [string ""]:1: unexpected symbol near char(4)/home/mark/torch/install/bin/lua: /home/mark/torch/install/share/lua/5.2/torch/File.lua:375: unknown object stack traceback: [C]: in function 'error' /home/mark/torch/install/share/lua/5.2/torch/File.lua:375: in function 'readObject' /home/mark/torch/install/share/lua/5.2/torch/File.lua:307: in function 'readObject' /home/mark/torch/install/share/lua/5.2/torch/File.lua:369: in function 'readObject' /home/mark/torch/install/share/lua/5.2/nn/Module.lua:192: in function 'read' /home/mark/torch/install/share/lua/5.2/torch/File.lua:351: in function 'readObject' /home/mark/torch/install/share/lua/5.2/torch/File.lua:369: in function 'readObject' /home/mark/torch/install/share/lua/5.2/torch/File.lua:369: in function 'readObject' /home/mark/torch/install/share/lua/5.2/nn/Module.lua:192: in function 'read' /home/mark/torch/install/share/lua/5.2/torch/File.lua:351: in function 'readObject' ... ...k/torch/install/share/lua/5.2/cunn/DataParallelTable.lua:398: in function 'read' /home/mark/torch/install/share/lua/5.2/torch/File.lua:351: in function 'readObject' /home/mark/torch/install/share/lua/5.2/torch/File.lua:409: in function 'load' /usr/relativeCameraPose-master/gpu_util.lua:54: in function 'loadDataParallel' /usr/relativeCameraPose-master/model.lua:71: in main chunk [C]: in function 'dofile' /home/mark/torch/install/share/lua/5.2/paths/init.lua:84: in function 'dofile' main.lua:29: in main chunk [C]: in function 'dofile' ...mark/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: in ?

Here is the pre-trained model's MD5 hash code: (created by md5sum command) bdf13b947817bd7d3244309b2cda811d ./pre-trained/siam_hybridnet_fullsized.t7

Is this file broken? or anything wrong? Could anyone give me a help?

SHENG-KAI-HUANG avatar Oct 31 '18 06:10 SHENG-KAI-HUANG

By the way, I had tried load model in 'ascii' mode, but I got the another error message:

/home/mark/torch/install/bin/lua: /home/mark/torch/install/share/lua/5.2/torch/File.lua:259: read error: read 0 blocks instead of 1 at /home/mark/torch/pkg/torch/lib/TH/THDiskFile.c:352 stack traceback: [C]: in function 'readInt' /home/mark/torch/install/share/lua/5.2/torch/File.lua:259: in function 'readObject' /home/mark/torch/install/share/lua/5.2/torch/File.lua:409: in function 'load' test.lua:4: in main chunk [C]: in function 'dofile' ...mark/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: in ?

SHENG-KAI-HUANG avatar Oct 31 '18 08:10 SHENG-KAI-HUANG

Hi there, Thank you for your interest in our work. The MD5 sum is correct. What version of CUDA and cudnn you have? I have installed torch and all the packages (nn, cunn, inn, cudnn) from scrath (with CUDA v9.2 and cudnn 5.1) and I could load the model at least.

imelekhov avatar Nov 01 '18 15:11 imelekhov

@imelekhov thank you for your answer, I am using Cuda 8.0 and CUDNN 6.0.

I have tried to train the model and have created some snapshot, and I can load those .t7 which created by myself. According to torch7's website , it say the load function in binary format will be platform dependent, and ASCII format is platform-independent. So, maybe those different setting (or package version) between your environment and my environment cause this error happened. Therefore I think maybe ASCII format pre-trained model can help me to solve this error. Would you mind turning the pre-trained model into ASCII format?

SHENG-KAI-HUANG avatar Nov 02 '18 04:11 SHENG-KAI-HUANG

I see. Sure, no problem. I have converted original weights to ascii format and put an archive here. MD5sum of the file inside is afcb6f1be9caf4a23d94b399fddfeb3d. Let me know if something goes wrong.

imelekhov avatar Nov 02 '18 09:11 imelekhov

Well, still have some problem here. the error message to load the ascii model is:

Warning: Failed to load function from bytecode: (binary): cannot load incompatible bytecodeWarning: Failed to load function from bytecode: [string "2..."]:1: unexpected symbol near '2'luajit: /home/mark/torch/install/share/lua/5.1/torch/File.lua:259: read error: read 0 blocks instead of 1 at /home/mark/torch/pkg/torch/lib/TH/THDiskFile.c:352

I am using Ubuntu 16.04 with Lua 5.1 now, I don't sure the version of Lua will impact or not. but it looks some symbol (or string?) in ascii file can't be recognize by my computer. I will find some time to install CUDA 9.2 and CUDNN 5.1 then try it again, I will told you the result as soon as possible.

By the way, would you mind sharing the landmarks dataset which you used to training and validation in the paper? I have looked the original dataset, but I don't know how to use it as you describe in the paper.

SHENG-KAI-HUANG avatar Nov 03 '18 04:11 SHENG-KAI-HUANG