DeepBench icon indicating copy to clipboard operation
DeepBench copied to clipboard

Problems with NVIDIA Benchmarks

Open yl-jiang opened this issue 6 years ago • 3 comments

Environment:

  1. GPU cards: Tesla K80
  2. CUDA:8.0
  3. cuDNN:5.1
  4. OpenMPI:1.10.2

Problems:

After make there are five files in .../nvidia/bin , they are:

conv_bench gemm_bench nccl_mpi_all_reduce nccl_single_all_reduce rnn_bench

And I can successfully run 'rnn_bench', 'nccl_single_all_reduce',

  1. But when I run 'gemm_bench' it give me the error of "terminate called after throwing an instance of 'std::runtime_error'";
  2. run 'conv_bench' it will be stop when procedure doing the 11th test,and the error is " terminate called after throwing an instance of 'std::runtime_error' what(): Illegal algorithm passed to get_fwd_algo_string. Algo: 7"
  3. run 'nccl_mpi_all_reduce' the error is "terminate called after throwing an instance of 'std::runtime_error'what(): NCCL failure: invalid device pointer in nccl_mpi_all_reduce.cu at line: 86 rank: 0"

How can I fix it?

yl-jiang avatar May 16 '18 03:05 yl-jiang

I haven't really tested DeepBench kernels for K80. Are you sure you compiled with the correct SM version? Are the drivers updated to run with CUDA 8.0?

sharannarang avatar May 22 '18 23:05 sharannarang

1.) As currently written, gemm_bench will fail for Kepler GPUs for CUDA 8 and later. cublasGemmEx() is only supported on GPUs with SM 5.0 or greater (i.e. Maxwell and newer). https://docs.nvidia.com/cuda/cublas/index.html#cublas-GemmEx

  1. Algo 7 is CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD_NONFUSED, and DeepBench has a case statement for that in get_fwd_algo_string() when CUDNN_MAJOR >= 6. Maybe a pre-cuDNNv6 header file was in your include path?

jfurtek avatar May 25 '18 14:05 jfurtek

I have changed CUDA version to 7.5 , cuDNN version to 5.0, and now the deepbench can run most of the benchmarks but except the 'nccl_mpi_all_reduce'.

yl-jiang avatar May 26 '18 10:05 yl-jiang