crnn
crnn copied to clipboard
Training fails, Tensor.lua:462: bad argument #1 to 'set'
I'm trying to port CRNN to ppc64le. I've used the most recent versions of all the frameworks (see https://github.com/j4zzcat/ppc64le/blob/master/poc/poc1.dockerfile), fixed whatever had to be fixed, and got to the point where the demo works, but the training breaks. This is most probably because of the somewhat different API of recent versions of Torch and nn compared to the versions used by the original CRNN work.
/opt/DL/torch/bin/luajit: /opt/DL/torch/share/lua/5.1/nn/Container.lua:67:
In 22 module of nn.Sequential:
/opt/DL/torch/share/lua/5.1/torch/Tensor.lua:462: bad argument #1 to 'set' (expecting number or Tensor or Storage)
stack traceback:
[C]: in function 'set'
/opt/DL/torch/share/lua/5.1/torch/Tensor.lua:462: in function 'view'
/opt/DL/torch/share/lua/5.1/nn/View.lua:90: in function 'updateGradInput'
/opt/DL/torch/share/lua/5.1/nn/Module.lua:31: in function </opt/DL/torch/share/lua/5.1/nn/Module.lua:29>
[C]: in function 'xpcall'
/opt/DL/torch/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/opt/DL/torch/share/lua/5.1/nn/Sequential.lua:84: in function 'backward'
/root/crnn/src/training.lua:24: in function 'opfunc'
/opt/DL/torch/share/lua/5.1/optim/adadelta.lua:31: in function 'optimMethod'
/root/crnn/src/training.lua:29: in function 'trainBatch'
/root/crnn/src/training.lua:94: in function 'trainModel'
main_train.lua:51: in main chunk
[C]: in function 'dofile'
/opt/DL/torch/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x10005990
Digging a little deeper, it seems that line 22 within https://github.com/bgshih/crnn/blob/master/src/training.lua, which reads:
...
model:backward(inputBatch, criterion:backward(outputBatch, targetBatch))
...
is breaking because the result of criterion:backward(outputBatch, targetBatch)
is empty, i.e, [torch.FloatTensor with no dimension]
. Any help appreciated :-)
@j4zzcat have you solved the problem, i count the same error too. Can you tell me how to solve it. Thanks a lot.
exactly same issue
I had the same issue running everything with the latest. I have reverted to using the libraries defined in the Dockerfile and now it works (after fixing couple of bugs).
I'm trying to port CRNN to ppc64le. I've used the most recent versions of all the frameworks (see https://github.com/j4zzcat/ppc64le/blob/master/poc/poc1.dockerfile), fixed whatever had to be fixed, and got to the point where the demo works, but the training breaks. This is most probably because of the somewhat different API of recent versions of Torch and nn compared to the versions used by the original CRNN work.
/opt/DL/torch/bin/luajit: /opt/DL/torch/share/lua/5.1/nn/Container.lua:67: In 22 module of nn.Sequential: /opt/DL/torch/share/lua/5.1/torch/Tensor.lua:462: bad argument #1 to 'set' (expecting number or Tensor or Storage) stack traceback: [C]: in function 'set' /opt/DL/torch/share/lua/5.1/torch/Tensor.lua:462: in function 'view' /opt/DL/torch/share/lua/5.1/nn/View.lua:90: in function 'updateGradInput' /opt/DL/torch/share/lua/5.1/nn/Module.lua:31: in function </opt/DL/torch/share/lua/5.1/nn/Module.lua:29> [C]: in function 'xpcall' /opt/DL/torch/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors' /opt/DL/torch/share/lua/5.1/nn/Sequential.lua:84: in function 'backward' /root/crnn/src/training.lua:24: in function 'opfunc' /opt/DL/torch/share/lua/5.1/optim/adadelta.lua:31: in function 'optimMethod' /root/crnn/src/training.lua:29: in function 'trainBatch' /root/crnn/src/training.lua:94: in function 'trainModel' main_train.lua:51: in main chunk [C]: in function 'dofile' /opt/DL/torch/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x10005990
Digging a little deeper, it seems that line 22 within https://github.com/bgshih/crnn/blob/master/src/training.lua, which reads:
... model:backward(inputBatch, criterion:backward(outputBatch, targetBatch)) ...
is breaking because the result of
criterion:backward(outputBatch, targetBatch)
is empty, i.e,[torch.FloatTensor with no dimension]
. Any help appreciated :-)
have you solved the problem