caffe2 icon indicating copy to clipboard operation
caffe2 copied to clipboard

trouble with cuDNN versions with CUDA 9.0

Open paciorek opened this issue 6 years ago • 4 comments

I'm using conda to install caffe2 under Ubuntu 16.04 in a conda environment (using the exact syntax for conda installation suggested on the caffe2 installation page) and have one suggestion and one question.

I have CUDA 9.0 installed in /usr/local and different versions of cuDNN (see below as this is part of my question).

The suggestion is to make clear on the caffe2 installation page which exact version of cuDNN (i.e. "7.x.y" vs simply indicating "7") is required. From error message below it seems that 7.1.1 is needed but that version is causing an error.

My question involves what to do to fix the following error when I use cuDNN 7.1.1 for CUDA 9.0 as done here:

(/tmp/caffe2-py36) paciorek@scf-sm20:/tmp/caffe2-py36/test> ./elementwise_op_gpu_test 
Running main() from gtest_main.cc
[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from ElementwiseGPUTest
[ RUN      ] ElementwiseGPUTest.And
unknown file: Failure
C++ exception with description "[enforce fail at context_gpu.h:171] . Encountered CUDA error: no kernel image is available for execution on the device Error from operator: 
input: "X" input: "Y" output: "Z" name: "test" type: "And" device_option { device_type: 1 }" thrown in the test body.
[  FAILED  ] ElementwiseGPUTest.And (2159 ms)
[ RUN      ] ElementwiseGPUTest.Or
unknown file: Failure
C++ exception with description "[enforce fail at context_gpu.h:171] . Encountered CUDA error: no kernel image is available for execution on the device Error from operator: 
input: "X" input: "Y" output: "Z" name: "test" type: "Or" device_option { device_type: 1 }" thrown in the test body.
[  FAILED  ] ElementwiseGPUTest.Or (0 ms)
[ RUN      ] ElementwiseGPUTest.Xor
unknown file: Failure

In contrast when I try to use cuDNN 7.0.5 for CUDA 9.0 the 'version mismatch' indicates caffe2 wants me to use 7.1.1 (hence the attempt above). (Note I get the analogous error for cuDNN 7.1.2.)

(/tmp/caffe2-py36) paciorek@scf-sm20:/tmp/caffe2-py36/test> ./elementwise_op_gpu_test 
Running main() from gtest_main.cc
[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from ElementwiseGPUTest
[ RUN      ] ElementwiseGPUTest.And
unknown file: Failure
C++ exception with description "[enforce fail at common_cudnn.h:118] version_match. cuDNN compiled (7101) and runtime (7005) versions mismatch " thrown in the test body.
[  FAILED  ] ElementwiseGPUTest.And (1730 ms)
[ RUN      ] ElementwiseGPUTest.Or
unknown file: Failure
C++ exception with description "[enforce fail at context_gpu.h:171] . Encountered CUDA error: no kernel image is available for execution on the device Error from operator: 
input: "X" input: "Y" output: "Z" name: "test" type: "Or" device_option { device_type: 1 }" thrown in the test body.
[  FAILED  ] ElementwiseGPUTest.Or (289 ms)
[ RUN      ] ElementwiseGPUTest.Xor

paciorek avatar Mar 23 '18 00:03 paciorek

Thanks for discovering this for us. I didn't realize that the CuDNN version had to match exactly. I'll add a note about this.

Did you build from source (conda build conda/cuda...) or use the binary install (conda install -c caffe2 ...)?

I believe that we use whatever CuDNN that's on these docker images https://hub.docker.com/r/nvidia/cuda/tags/ .

pjh5 avatar Mar 27 '18 00:03 pjh5

I used the binary.

Do you know what would cause the error when I do use the matching cuDNN (7.1.1)?

paciorek avatar Mar 27 '18 01:03 paciorek

I've seen a few people report this error message, but I've never been able to repro it myself so I can't say for certain. Do you know if your GPU supports the version of CUDA and CuDNN that you have?

pjh5 avatar Mar 27 '18 14:03 pjh5

Yes, I just ran the mnistCUDNN test and it ran fine.

paciorek avatar Apr 02 '18 22:04 paciorek