mmdetection icon indicating copy to clipboard operation
mmdetection copied to clipboard

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-10.2'

Open LeungWaiHo opened this issue 3 years ago • 4 comments

After I shut down my computer before incompletely killing all model training processes, this problem occurred. I try to reinstall the cuda driver and the anaconda environment is same as get_started.md, but this problem still occurres? How can I solve? Moreover, when I import torch torch.cuda.is_available=False

我在一次训练模型进程未关闭完就把电脑关闭了,然后尝试重新训练时出现了这个问题,我尝试过了重装cuda驱动,且anaconda下的虚拟环境也是按照教程中的来搭建的,请问为何会出现这个问题? 而且当我导入pytorch时会出现以下问题: torch.cuda.is_available=False

LeungWaiHo avatar Jan 15 '22 07:01 LeungWaiHo

Please check where the version of your Cuda toolkit is compatible with your PyTorch

jshilong avatar Jan 16 '22 11:01 jshilong

Please check where the version of your Cuda toolkit is compatible with your PyTorch

I check it and they are matched. The anaconda environment is built according to those in "get_started.md".

LeungWaiHo avatar Jan 16 '22 11:01 LeungWaiHo

what did you do to solve this issue? i noticed that i have cuda version 11.6 installed in anaconda ,and as toolkit here are : screen12

and

screen13

but when using this instruction nvcc --version it gives me a cuda release 10.1 as following screen11

and this code python mmdet/utils/collect_env.pygives me his No CUDA runtime is found, using CUDA_HOME='/usr'

what shall i do ? please help

julianhund avatar Aug 02 '22 15:08 julianhund

not sure if you still need this. nvidia-smi cmd shows the driver version cuda. nvcc cmd gives you the runtime version cuda. these are two different things. The runtime version is the one needs to match with the pytorch cuda.

simonnxren avatar Dec 23 '22 04:12 simonnxren

Hey! I am also facing this issue!

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_Mar__8_18:18:20_PST_2022
Cuda compilation tools, release 11.6, V11.6.124
Build cuda_11.6.r11.6/compiler.31057947_0
$ python3 mmdetection/mmdet/utils/collect_env.py
/home/ubuntu/benchmarks/cuda_116/lib/python3.8/site-packages/mmcv/__init__.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
  warnings.warn(
fatal: not a git repository (or any of the parent directories): .git
sys.platform: linux
Python: 3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0]
CUDA available: True
GPU 0: Tesla V100-SXM2-16GB
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.6, V11.6.124
GCC: x86_64-linux-gnu-gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
PyTorch: 1.12.0+cu116
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.6
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.4.1
    - Built with CuDNN 8.3.2
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

TorchVision: 0.13.0+cu116
OpenCV: 4.7.0
MMCV: 1.7.1
MMCV Compiler: GCC 9.3
MMCV CUDA Compiler: 11.6
MMDetection: 2.28.1+

but when I train with mmdet, I get:

RuntimeError: No CUDA GPUs are available

Tried, to train with PyTorch only, everything worked fine.

used this command to install mmdet:

pip install torch==1.12.0+cu116 torchvision==0.13.0+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
python3 -m pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu116/torch1.12.0/index.html
git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection
python3 -m pip install -e .

mkdir checkpoints
wget -c https://download.openmmlab.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco_bbox_mAP-0.408__segm_mAP-0.37_20200504_163245-42aa3d00.pth \
      -O checkpoints/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco_bbox_mAP-0.408__segm_mAP-0.37_20200504_163245-42aa3d00.pth

pip install future tensorboard

adolkhan avatar Feb 27 '23 10:02 adolkhan