crnn icon indicating copy to clipboard operation
crnn copied to clipboard

Training fails, Tensor.lua:462: bad argument #1 to 'set'

Open j4zzcat opened this issue 6 years ago • 4 comments

I'm trying to port CRNN to ppc64le. I've used the most recent versions of all the frameworks (see https://github.com/j4zzcat/ppc64le/blob/master/poc/poc1.dockerfile), fixed whatever had to be fixed, and got to the point where the demo works, but the training breaks. This is most probably because of the somewhat different API of recent versions of Torch and nn compared to the versions used by the original CRNN work.

/opt/DL/torch/bin/luajit: /opt/DL/torch/share/lua/5.1/nn/Container.lua:67:
In 22 module of nn.Sequential:
/opt/DL/torch/share/lua/5.1/torch/Tensor.lua:462: bad argument #1 to 'set' (expecting number or Tensor or Storage)
stack traceback:
        [C]: in function 'set'
        /opt/DL/torch/share/lua/5.1/torch/Tensor.lua:462: in function 'view'
        /opt/DL/torch/share/lua/5.1/nn/View.lua:90: in function 'updateGradInput'
        /opt/DL/torch/share/lua/5.1/nn/Module.lua:31: in function </opt/DL/torch/share/lua/5.1/nn/Module.lua:29>
        [C]: in function 'xpcall'
        /opt/DL/torch/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
        /opt/DL/torch/share/lua/5.1/nn/Sequential.lua:84: in function 'backward'
        /root/crnn/src/training.lua:24: in function 'opfunc'
        /opt/DL/torch/share/lua/5.1/optim/adadelta.lua:31: in function 'optimMethod'
        /root/crnn/src/training.lua:29: in function 'trainBatch'
        /root/crnn/src/training.lua:94: in function 'trainModel'
        main_train.lua:51: in main chunk
        [C]: in function 'dofile'
        /opt/DL/torch/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
        [C]: at 0x10005990

Digging a little deeper, it seems that line 22 within https://github.com/bgshih/crnn/blob/master/src/training.lua, which reads:

...
model:backward(inputBatch, criterion:backward(outputBatch, targetBatch))
...

is breaking because the result of criterion:backward(outputBatch, targetBatch) is empty, i.e, [torch.FloatTensor with no dimension]. Any help appreciated :-)

j4zzcat avatar Sep 05 '17 19:09 j4zzcat

@j4zzcat have you solved the problem, i count the same error too. Can you tell me how to solve it. Thanks a lot.

SDASDASA avatar Oct 02 '17 02:10 SDASDASA

exactly same issue

ilovin avatar May 12 '18 13:05 ilovin

I had the same issue running everything with the latest. I have reverted to using the libraries defined in the Dockerfile and now it works (after fixing couple of bugs).

khsibr avatar May 16 '18 21:05 khsibr

I'm trying to port CRNN to ppc64le. I've used the most recent versions of all the frameworks (see https://github.com/j4zzcat/ppc64le/blob/master/poc/poc1.dockerfile), fixed whatever had to be fixed, and got to the point where the demo works, but the training breaks. This is most probably because of the somewhat different API of recent versions of Torch and nn compared to the versions used by the original CRNN work.

/opt/DL/torch/bin/luajit: /opt/DL/torch/share/lua/5.1/nn/Container.lua:67:
In 22 module of nn.Sequential:
/opt/DL/torch/share/lua/5.1/torch/Tensor.lua:462: bad argument #1 to 'set' (expecting number or Tensor or Storage)
stack traceback:
        [C]: in function 'set'
        /opt/DL/torch/share/lua/5.1/torch/Tensor.lua:462: in function 'view'
        /opt/DL/torch/share/lua/5.1/nn/View.lua:90: in function 'updateGradInput'
        /opt/DL/torch/share/lua/5.1/nn/Module.lua:31: in function </opt/DL/torch/share/lua/5.1/nn/Module.lua:29>
        [C]: in function 'xpcall'
        /opt/DL/torch/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
        /opt/DL/torch/share/lua/5.1/nn/Sequential.lua:84: in function 'backward'
        /root/crnn/src/training.lua:24: in function 'opfunc'
        /opt/DL/torch/share/lua/5.1/optim/adadelta.lua:31: in function 'optimMethod'
        /root/crnn/src/training.lua:29: in function 'trainBatch'
        /root/crnn/src/training.lua:94: in function 'trainModel'
        main_train.lua:51: in main chunk
        [C]: in function 'dofile'
        /opt/DL/torch/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
        [C]: at 0x10005990

Digging a little deeper, it seems that line 22 within https://github.com/bgshih/crnn/blob/master/src/training.lua, which reads:

...
model:backward(inputBatch, criterion:backward(outputBatch, targetBatch))
...

is breaking because the result of criterion:backward(outputBatch, targetBatch) is empty, i.e, [torch.FloatTensor with no dimension]. Any help appreciated :-)

have you solved the problem

wutianye avatar Apr 07 '19 11:04 wutianye