caffe2
caffe2 copied to clipboard
trouble with cuDNN versions with CUDA 9.0
I'm using conda to install caffe2 under Ubuntu 16.04 in a conda environment (using the exact syntax for conda installation suggested on the caffe2 installation page) and have one suggestion and one question.
I have CUDA 9.0 installed in /usr/local and different versions of cuDNN (see below as this is part of my question).
The suggestion is to make clear on the caffe2 installation page which exact version of cuDNN (i.e. "7.x.y" vs simply indicating "7") is required. From error message below it seems that 7.1.1 is needed but that version is causing an error.
My question involves what to do to fix the following error when I use cuDNN 7.1.1 for CUDA 9.0 as done here:
(/tmp/caffe2-py36) paciorek@scf-sm20:/tmp/caffe2-py36/test> ./elementwise_op_gpu_test
Running main() from gtest_main.cc
[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from ElementwiseGPUTest
[ RUN ] ElementwiseGPUTest.And
unknown file: Failure
C++ exception with description "[enforce fail at context_gpu.h:171] . Encountered CUDA error: no kernel image is available for execution on the device Error from operator:
input: "X" input: "Y" output: "Z" name: "test" type: "And" device_option { device_type: 1 }" thrown in the test body.
[ FAILED ] ElementwiseGPUTest.And (2159 ms)
[ RUN ] ElementwiseGPUTest.Or
unknown file: Failure
C++ exception with description "[enforce fail at context_gpu.h:171] . Encountered CUDA error: no kernel image is available for execution on the device Error from operator:
input: "X" input: "Y" output: "Z" name: "test" type: "Or" device_option { device_type: 1 }" thrown in the test body.
[ FAILED ] ElementwiseGPUTest.Or (0 ms)
[ RUN ] ElementwiseGPUTest.Xor
unknown file: Failure
In contrast when I try to use cuDNN 7.0.5 for CUDA 9.0 the 'version mismatch' indicates caffe2 wants me to use 7.1.1 (hence the attempt above). (Note I get the analogous error for cuDNN 7.1.2.)
(/tmp/caffe2-py36) paciorek@scf-sm20:/tmp/caffe2-py36/test> ./elementwise_op_gpu_test
Running main() from gtest_main.cc
[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from ElementwiseGPUTest
[ RUN ] ElementwiseGPUTest.And
unknown file: Failure
C++ exception with description "[enforce fail at common_cudnn.h:118] version_match. cuDNN compiled (7101) and runtime (7005) versions mismatch " thrown in the test body.
[ FAILED ] ElementwiseGPUTest.And (1730 ms)
[ RUN ] ElementwiseGPUTest.Or
unknown file: Failure
C++ exception with description "[enforce fail at context_gpu.h:171] . Encountered CUDA error: no kernel image is available for execution on the device Error from operator:
input: "X" input: "Y" output: "Z" name: "test" type: "Or" device_option { device_type: 1 }" thrown in the test body.
[ FAILED ] ElementwiseGPUTest.Or (289 ms)
[ RUN ] ElementwiseGPUTest.Xor
Thanks for discovering this for us. I didn't realize that the CuDNN version had to match exactly. I'll add a note about this.
Did you build from source (conda build conda/cuda...
) or use the binary install (conda install -c caffe2 ...
)?
I believe that we use whatever CuDNN that's on these docker images https://hub.docker.com/r/nvidia/cuda/tags/ .
I used the binary.
Do you know what would cause the error when I do use the matching cuDNN (7.1.1)?
I've seen a few people report this error message, but I've never been able to repro it myself so I can't say for certain. Do you know if your GPU supports the version of CUDA and CuDNN that you have?
Yes, I just ran the mnistCUDNN test and it ran fine.