cudnn.torch icon indicating copy to clipboard operation
cudnn.torch copied to clipboard

Error in CuDNN: CUDNN_STATUS_BAD_PARAM (cudnnSetConvolutionNdDescriptor)

Open kuixu opened this issue 8 years ago • 4 comments

I encounter these error messages, when run cifar10 with 2 GPUs. Command line is CUDA_VISIBLE_DEVICES=0,1 th main.lua -depth 50 -batchSize 1024 -nGPU 2 -nThreads 8 -shareGradInput true -dataset cifar10 .

/home/scs4850/torch/install/share/lua/5.1/cudnn/init.lua:162: Error in CuDNN: CUDNN_STATUS_BAD_PARAM (cudnnSetConvolutionNdDescriptor)
stack traceback:
        [C]: in function 'error'
        /home/scs4850/torch/install/share/lua/5.1/cudnn/init.lua:162: in function 'errcheck'
        /home/scs4850/torch/install/share/lua/5.1/cudnn/init.lua:217: in function 'setConvolutionDescriptor'
        ...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:130: in function 'createIODescriptors'
        ...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:187: in function <...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:185>
        [C]: in function 'xpcall'
        /home/scs4850/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
        /home/scs4850/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </home/scs4850/torch/install/share/lua/5.1/nn/Sequential.lua:41>
        [C]: in function 'xpcall'
        /home/scs4850/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
        ...
        /home/scs4850/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </home/scs4850/torch/install/share/lua/5.1/nn/Sequential.lua:41>
        [C]: in function 'xpcall'
        /home/scs4850/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
        /home/scs4850/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </home/scs4850/torch/install/share/lua/5.1/nn/Sequential.lua:41>
        [C]: in function 'xpcall'
        .../scs4850/torch/install/share/lua/5.1/threads/threads.lua:234: in function 'callback'
        /home/scs4850/torch/install/share/lua/5.1/threads/queue.lua:65: in function </home/scs4850/torch/install/share/lua/5.1/threads/queue.lua:41>
        [C]: in function 'pcall'
        /home/scs4850/torch/install/share/lua/5.1/threads/queue.lua:40: in function 'dojob'
        [string "  local Queue = require 'threads.queue'..."]:13: in main chunk

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
        [C]: in function 'error'
        /home/scs4850/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
        /home/scs4850/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </home/scs4850/torch/install/share/lua/5.1/nn/Sequential.lua:41>
        [C]: in function 'xpcall'
        .../scs4850/torch/install/share/lua/5.1/threads/threads.lua:234: in function 'callback'
        /home/scs4850/torch/install/share/lua/5.1/threads/queue.lua:65: in function </home/scs4850/torch/install/share/lua/5.1/threads/queue.lua:41>
        [C]: in function 'pcall'
        /home/scs4850/torch/install/share/lua/5.1/threads/queue.lua:40: in function 'dojob'
        [string "  local Queue = require 'threads.queue'..."]:13: in main chunk
stack traceback:
        [C]: in function 'error'
        .../scs4850/torch/install/share/lua/5.1/threads/threads.lua:183: in function 'dojob'
        .../scs4850/torch/install/share/lua/5.1/threads/threads.lua:264: in function 'synchronize'
        ...0/torch/install/share/lua/5.1/cunn/DataParallelTable.lua:717: in function 'exec'
        ...0/torch/install/share/lua/5.1/cunn/DataParallelTable.lua:195: in function 'forward'
        ./train.lua:116: in function 'test'
        main.lua:52: in main chunk
        [C]: in function 'dofile'
        ...4850/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
        [C]: at 0x00406670

Anyone here could help me?

kuixu avatar Dec 06 '16 18:12 kuixu

I am seeing similar error too. Were you able to troubleshoot this?

livenletdie avatar Dec 22 '16 20:12 livenletdie

same problem here

shimen avatar Feb 07 '17 15:02 shimen

I also have the same error

caseyanya avatar Mar 10 '17 15:03 caseyanya

have you tried a recent cudnn.torch? This commit likely fixed this issue: https://github.com/soumith/cudnn.torch/commit/7f3e2b22c50d12c8583f33ff792c88d692bcef49

gchanan avatar Mar 10 '17 15:03 gchanan