trackformer error in ms_deformable_im2col

I have install MultiScaleDeformableAttention package, but here comes two errors and the model is still training: error in ms_deformable_im2col_cuda: no kernel image is available for execution on the device error in ms_deformable_col2im_coord_cuda: no kernel image is available for execution on the device

Oct 17 '22 07:10 hahapt

This means you are running the model not on the same GPU architecture as you used for the compilcation of the MultiScaleDeformableAttention package. Please check your setup!

Oct 28 '22 18:10 timmeinhardt

Run the command nvidia-smi to identify GPU architecture NVIDIA A10G, look up Compute Capability: 8.6
Run nvcc --version to check CUDA toolkit version: 11.3, which compatible with GPU architecture
Install PyTorch compatible with CUDA 11.3: conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch
Recompile MultiScaleDeformableAttention package: Set export TORCH_CUDA_ARCH_LIST="8.6" and rebuild python src/trackformer/models/ops/setup.py build --build-base=src/trackformer/models/ops/ install

Step 3 is due to I got RuntimeError: Unable to find a valid cuDNN algorithm to run convolution (try_all at /opt/conda/conda-bld/pytorch_1591914855613/work/aten/src/ATen/native/cudnn/Conv.cpp:693) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x4e (0x7f8d6df54b5e in /opt/conda/envs/trackformer_fix_MSDA/lib/python3.7/site-packages/torch/lib/libc10.so)

Mar 14 '24 00:03 Ricardo-Yu