DeepBench
DeepBench copied to clipboard
Problems with NVIDIA Benchmarks
Environment:
- GPU cards: Tesla K80
- CUDA:8.0
- cuDNN:5.1
- OpenMPI:1.10.2
Problems:
After make there are five files in .../nvidia/bin , they are:
conv_bench gemm_bench nccl_mpi_all_reduce nccl_single_all_reduce rnn_bench
And I can successfully run 'rnn_bench', 'nccl_single_all_reduce',
- But when I run 'gemm_bench' it give me the error of "terminate called after throwing an instance of 'std::runtime_error'";
- run 'conv_bench' it will be stop when procedure doing the 11th test,and the error is " terminate called after throwing an instance of 'std::runtime_error' what(): Illegal algorithm passed to get_fwd_algo_string. Algo: 7"
- run 'nccl_mpi_all_reduce' the error is "terminate called after throwing an instance of 'std::runtime_error'what(): NCCL failure: invalid device pointer in nccl_mpi_all_reduce.cu at line: 86 rank: 0"
How can I fix it?
I haven't really tested DeepBench kernels for K80. Are you sure you compiled with the correct SM version? Are the drivers updated to run with CUDA 8.0?
1.) As currently written, gemm_bench will fail for Kepler GPUs for CUDA 8 and later. cublasGemmEx() is only supported on GPUs with SM 5.0 or greater (i.e. Maxwell and newer). https://docs.nvidia.com/cuda/cublas/index.html#cublas-GemmEx
- Algo 7 is CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD_NONFUSED, and DeepBench has a case statement for that in get_fwd_algo_string() when CUDNN_MAJOR >= 6. Maybe a pre-cuDNNv6 header file was in your include path?
I have changed CUDA version to 7.5 , cuDNN version to 5.0, and now the deepbench can run most of the benchmarks but except the 'nccl_mpi_all_reduce'.