SCNN icon indicating copy to clipboard operation
SCNN copied to clipboard

训练非常的慢,有什么好建议?

Open Matrixio opened this issue 7 years ago • 5 comments

加scnn之后,训练非常慢。可能是因为scnn类似于rnn的原因。之前1小时能处理20000张图,训练完一轮,现在至少慢了6--7倍。

Matrixio avatar Aug 13 '18 13:08 Matrixio

@Matrixio , it is normal since SCNN propagates the information in a sequential way. As to the acceleration of training process, a naive way is to use multiple gpus. An alternative solution is to use a more efficient and light-weight model (e.g., change VGG-16 to ResNet-18).

cardwing avatar Oct 12 '18 05:10 cardwing

@Matrixio , you can refer to Codes-for-Lane-Detection where I will put my implemented version of lane detection models.

cardwing avatar Oct 12 '18 07:10 cardwing

@cardwing @XingangPan 您好,我在测试和训练过程中都出现了如下的错误,想问一下是不是cudnn的版本问题,我的cuda是8.0,cudnn v4 data created data loaded data loaded 1 bad /home/kb457/torch/install/bin/luajit: /home/kb457/torch/install/share/lua/5.1/nn/Container.lua:67: In 2 module of nn.Sequential: ...torch/install/share/lua/5.1/cudnn/BatchNormalization.lua:69: bad argument #1 to 'resizeAs' (torch.CudaTensor expected, got userdata) stack traceback: [C]: in function 'resizeAs' ...torch/install/share/lua/5.1/cudnn/BatchNormalization.lua:69: in function <...torch/install/share/lua/5.1/cudnn/BatchNormalization.lua:63> [C]: in function 'xpcall' /home/kb457/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors' /home/kb457/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward' testLane.lua:72: in main chunk [C]: in function 'dofile' ...b457/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405de0

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above. stack traceback: [C]: in function 'error' /home/kb457/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors' /home/kb457/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward' testLane.lua:72: in main chunk [C]: in function 'dofile' ...b457/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405de0 非常感谢,已经因为这个问题纠结几天了

planckztd avatar Nov 10 '18 09:11 planckztd

@planckztd , please refer to this issue. The error should be caused by the version of cudnn. You just need to upgrade cudnn from 4.0 to 5.0.

cardwing avatar Nov 10 '18 13:11 cardwing

@planckztd , you can also refer to this repo which is a bit faster.

cardwing avatar Dec 01 '18 04:12 cardwing