trackformer icon indicating copy to clipboard operation
trackformer copied to clipboard

error in ms_deformable_im2col_cuda

Open hahapt opened this issue 3 years ago • 2 comments

I have install MultiScaleDeformableAttention package, but here comes two errors and the model is still training: error in ms_deformable_im2col_cuda: no kernel image is available for execution on the device error in ms_deformable_col2im_coord_cuda: no kernel image is available for execution on the device

hahapt avatar Oct 17 '22 07:10 hahapt

This means you are running the model not on the same GPU architecture as you used for the compilcation of the MultiScaleDeformableAttention package. Please check your setup!

timmeinhardt avatar Oct 28 '22 18:10 timmeinhardt

  1. Run the command nvidia-smi to identify GPU architecture NVIDIA A10G, look up Compute Capability: 8.6
  2. Run nvcc --version to check CUDA toolkit version: 11.3, which compatible with GPU architecture
  3. Install PyTorch compatible with CUDA 11.3: conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch
  4. Recompile MultiScaleDeformableAttention package: Set export TORCH_CUDA_ARCH_LIST="8.6" and rebuild python src/trackformer/models/ops/setup.py build --build-base=src/trackformer/models/ops/ install

Step 3 is due to I got RuntimeError: Unable to find a valid cuDNN algorithm to run convolution (try_all at /opt/conda/conda-bld/pytorch_1591914855613/work/aten/src/ATen/native/cudnn/Conv.cpp:693) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x4e (0x7f8d6df54b5e in /opt/conda/envs/trackformer_fix_MSDA/lib/python3.7/site-packages/torch/lib/libc10.so)

Ricardo-Yu avatar Mar 14 '24 00:03 Ricardo-Yu