cudnn.torch
cudnn.torch copied to clipboard
Error in CuDNN: CUDNN_STATUS_BAD_PARAM (cudnnSetConvolutionNdDescriptor)
I encounter these error messages, when run cifar10 with 2 GPUs. Command line is CUDA_VISIBLE_DEVICES=0,1 th main.lua -depth 50 -batchSize 1024 -nGPU 2 -nThreads 8 -shareGradInput true -dataset cifar10
.
/home/scs4850/torch/install/share/lua/5.1/cudnn/init.lua:162: Error in CuDNN: CUDNN_STATUS_BAD_PARAM (cudnnSetConvolutionNdDescriptor)
stack traceback:
[C]: in function 'error'
/home/scs4850/torch/install/share/lua/5.1/cudnn/init.lua:162: in function 'errcheck'
/home/scs4850/torch/install/share/lua/5.1/cudnn/init.lua:217: in function 'setConvolutionDescriptor'
...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:130: in function 'createIODescriptors'
...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:187: in function <...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:185>
[C]: in function 'xpcall'
/home/scs4850/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/scs4850/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </home/scs4850/torch/install/share/lua/5.1/nn/Sequential.lua:41>
[C]: in function 'xpcall'
/home/scs4850/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
...
/home/scs4850/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </home/scs4850/torch/install/share/lua/5.1/nn/Sequential.lua:41>
[C]: in function 'xpcall'
/home/scs4850/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/scs4850/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </home/scs4850/torch/install/share/lua/5.1/nn/Sequential.lua:41>
[C]: in function 'xpcall'
.../scs4850/torch/install/share/lua/5.1/threads/threads.lua:234: in function 'callback'
/home/scs4850/torch/install/share/lua/5.1/threads/queue.lua:65: in function </home/scs4850/torch/install/share/lua/5.1/threads/queue.lua:41>
[C]: in function 'pcall'
/home/scs4850/torch/install/share/lua/5.1/threads/queue.lua:40: in function 'dojob'
[string " local Queue = require 'threads.queue'..."]:13: in main chunk
WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
/home/scs4850/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
/home/scs4850/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </home/scs4850/torch/install/share/lua/5.1/nn/Sequential.lua:41>
[C]: in function 'xpcall'
.../scs4850/torch/install/share/lua/5.1/threads/threads.lua:234: in function 'callback'
/home/scs4850/torch/install/share/lua/5.1/threads/queue.lua:65: in function </home/scs4850/torch/install/share/lua/5.1/threads/queue.lua:41>
[C]: in function 'pcall'
/home/scs4850/torch/install/share/lua/5.1/threads/queue.lua:40: in function 'dojob'
[string " local Queue = require 'threads.queue'..."]:13: in main chunk
stack traceback:
[C]: in function 'error'
.../scs4850/torch/install/share/lua/5.1/threads/threads.lua:183: in function 'dojob'
.../scs4850/torch/install/share/lua/5.1/threads/threads.lua:264: in function 'synchronize'
...0/torch/install/share/lua/5.1/cunn/DataParallelTable.lua:717: in function 'exec'
...0/torch/install/share/lua/5.1/cunn/DataParallelTable.lua:195: in function 'forward'
./train.lua:116: in function 'test'
main.lua:52: in main chunk
[C]: in function 'dofile'
...4850/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
Anyone here could help me?
I am seeing similar error too. Were you able to troubleshoot this?
same problem here
I also have the same error
have you tried a recent cudnn.torch? This commit likely fixed this issue: https://github.com/soumith/cudnn.torch/commit/7f3e2b22c50d12c8583f33ff792c88d692bcef49