error in ms_deformable_im2col_cuda
I have install MultiScaleDeformableAttention package, but here comes two errors and the model is still training: error in ms_deformable_im2col_cuda: no kernel image is available for execution on the device error in ms_deformable_col2im_coord_cuda: no kernel image is available for execution on the device
This means you are running the model not on the same GPU architecture as you used for the compilcation of the MultiScaleDeformableAttention package. Please check your setup!
- Run the command
nvidia-smito identify GPU architecture NVIDIA A10G, look up Compute Capability: 8.6 - Run
nvcc --versionto check CUDA toolkit version: 11.3, which compatible with GPU architecture - Install PyTorch compatible with CUDA 11.3:
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch - Recompile MultiScaleDeformableAttention package: Set
export TORCH_CUDA_ARCH_LIST="8.6"and rebuildpython src/trackformer/models/ops/setup.py build --build-base=src/trackformer/models/ops/ install
Step 3 is due to I got RuntimeError: Unable to find a valid cuDNN algorithm to run convolution (try_all at /opt/conda/conda-bld/pytorch_1591914855613/work/aten/src/ATen/native/cudnn/Conv.cpp:693) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x4e (0x7f8d6df54b5e in /opt/conda/envs/trackformer_fix_MSDA/lib/python3.7/site-packages/torch/lib/libc10.so)