CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
When I was using the mamba , I encountered the following problem
CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
This is my server graphics card configuration
+---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Quadro GP100 Off | 00000000:18:00.0 Off | 0 | | 38% 53C P0 30W / 235W | 428MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 Quadro GP100 Off | 00000000:3B:00.0 Off | Off | | 30% 46C P0 25W / 235W | 6MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 2 Quadro GP100 Off | 00000000:3C:00.0 Off | Off | | 33% 49C P0 27W / 235W | 6MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 3 Quadro GP100 Off | 00000000:86:00.0 Off | Off | | 37% 54C P0 28W / 235W | 6MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 4 Quadro GP100 Off | 00000000:87:00.0 Off | Off | | 33% 50C P0 26W / 235W | 6MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 5 Quadro GP100 Off | 00000000:AF:00.0 Off | Off | | 35% 51C P0 28W / 235W | 6MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 6 Quadro GP100 Off | 00000000:B0:00.0 Off | Off | | 32% 48C P0 28W / 235W | 6MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 7 Quadro GP100 Off | 00000000:D8:00.0 Off | Off | | 32% 48C P0 25W / 235W | 6MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 3500 G /usr/lib/xorg/Xorg 4MiB | | 0 N/A N/A 45023 C python 420MiB | | 1 N/A N/A 3500 G /usr/lib/xorg/Xorg 4MiB | | 2 N/A N/A 3500 G /usr/lib/xorg/Xorg 4MiB | | 3 N/A N/A 3500 G /usr/lib/xorg/Xorg 4MiB | | 4 N/A N/A 3500 G /usr/lib/xorg/Xorg 4MiB | | 5 N/A N/A 3500 G /usr/lib/xorg/Xorg 4MiB | | 6 N/A N/A 3500 G /usr/lib/xorg/Xorg 4MiB | | 7 N/A N/A 3500 G /usr/lib/xorg/Xorg 4MiB | +---------------------------------------------------------------------------------------+
This is my conda list for server
Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
bzip2 1.0.8 h5eee18b_5
ca-certificates 2024.3.11 h06a4308_0
causal-conv1d 1.2.0 pypi_0 pypi
certifi 2022.12.7 pypi_0 pypi
charset-normalizer 2.1.1 pypi_0 pypi
cmake 3.25.0 pypi_0 pypi
cuda-nvcc 11.8.89 0 nvidia/label/cuda-11.8.0
cudatoolkit 11.8.0 h6a678d5_0
einops 0.7.0 pypi_0 pypi
filelock 3.9.0 pypi_0 pypi
fsspec 2024.3.1 pypi_0 pypi
huggingface-hub 0.21.4 pypi_0 pypi
idna 3.4 pypi_0 pypi
jinja2 3.1.2 pypi_0 pypi
ld_impl_linux-64 2.38 h1181459_1
libffi 3.4.4 h6a678d5_0
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libstdcxx-ng 11.2.0 h1234567_1
libuuid 1.41.5 h5eee18b_0
lit 15.0.7 pypi_0 pypi
mamba-ssm 1.2.0 pypi_0 pypi
markupsafe 2.1.3 pypi_0 pypi
monai 1.2.0 pypi_0 pypi
mpmath 1.3.0 pypi_0 pypi
ncurses 6.4 h6a678d5_0
networkx 3.2.1 pypi_0 pypi
ninja 1.11.1.1 pypi_0 pypi
numpy 1.26.3 pypi_0 pypi
openssl 3.0.13 h7f8727e_0
packaging 23.2 py310h06a4308_0
pillow 10.2.0 pypi_0 pypi
pip 23.3.1 py310h06a4308_0
python 3.10.14 h955ad1f_0
pyyaml 6.0.1 pypi_0 pypi
readline 8.2 h5eee18b_0
regex 2023.12.25 pypi_0 pypi
requests 2.28.1 pypi_0 pypi
safetensors 0.4.2 pypi_0 pypi
setuptools 68.2.2 py310h06a4308_0
sqlite 3.41.2 h5eee18b_0
sympy 1.12 pypi_0 pypi
tk 8.6.12 h1ccaba5_0
tokenizers 0.15.2 pypi_0 pypi
torch 2.0.1+cu118 pypi_0 pypi
torchaudio 2.0.2+cu118 pypi_0 pypi
torchvision 0.15.2+cu118 pypi_0 pypi
tqdm 4.66.2 pypi_0 pypi
transformers 4.39.1 pypi_0 pypi
triton 2.0.0 pypi_0 pypi
typing-extensions 4.8.0 pypi_0 pypi
tzdata 2024a h04d1e81_0
urllib3 1.26.13 pypi_0 pypi
wheel 0.41.2 py310h06a4308_0
xz 5.4.6 h5eee18b_0
zlib 1.2.13 h5eee18b_0
The steps for me to configure the environment are
conda create -n your_env_name python=3.10.13 conda activate your_env_name conda install cudatoolkit==11.8 -c nvidia pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118 conda install -c "nvidia/label/cuda-11.8.0" cuda-nvcc conda install packaging git clone https://github.com/Dao-AILab/causal-conv1d.git cd causal-conv1d git checkout v1.2.0 # current latest version tag CAUSAL_CONV1D_FORCE_BUILD=TRUE pip install . cd .. git clone https://github.com/state-spaces/mamba.git cd ./mamba git checkout v1.2.0 # current latest version tag MAMBA_FORCE_BUILD=TRUE pip install .
But when I run on my own independent computer, I don't get an error message
This is the graphics card configuration for my standalone computer
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.125.06 Driver Version: 525.125.06 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A | | 26% 50C P8 21W / 125W | 548MiB / 6144MiB | 39% Default | | | | N/A | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1111 G /usr/lib/xorg/Xorg 189MiB | | 0 N/A N/A 1468 G /usr/bin/gnome-shell 116MiB | | 0 N/A N/A 11669 G ...--variations-seed-version 147MiB | | 0 N/A N/A 33188 G ...RendererForSitePerProcess 58MiB | | 0 N/A N/A 117049 G ...--variations-seed-version 31MiB | +-----------------------------------------------------------------------------+ GTX1660S
This is my conda list for my standalone computer
Name Version Build Channel _libgcc_mutex 0.1 conda_forge https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge _openmp_mutex 4.5 2_gnu https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge bzip2 1.0.8 hd590300_5 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge ca-certificates 2024.2.2 hbcca054_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge causal-conv1d 1.2.0 pypi_0 pypi certifi 2022.12.7 pypi_0 pypi charset-normalizer 2.1.1 pypi_0 pypi cmake 3.25.0 pypi_0 pypi cuda-nvcc 11.8.89 0 nvidia/label/cuda-11.8.0 cudatoolkit 11.8.0 h4ba93d1_13 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge einops 0.7.0 pypi_0 pypi filelock 3.9.0 pypi_0 pypi fsspec 2024.3.1 pypi_0 pypi huggingface-hub 0.21.4 pypi_0 pypi idna 3.4 pypi_0 pypi jinja2 3.1.2 pypi_0 pypi ld_impl_linux-64 2.40 h41732ed_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libffi 3.4.2 h7f98852_5 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libgcc-ng 13.2.0 h807b86a_5 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libgomp 13.2.0 h807b86a_5 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libnsl 2.0.1 hd590300_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libsqlite 3.45.2 h2797004_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libstdcxx-ng 13.2.0 h7e041cc_5 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libuuid 2.38.1 h0b41bf4_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libxcrypt 4.4.36 hd590300_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libzlib 1.2.13 hd590300_5 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge lit 15.0.7 pypi_0 pypi mamba-ssm 1.2.0 pypi_0 pypi markupsafe 2.1.3 pypi_0 pypi monai 1.2.0 pypi_0 pypi mpmath 1.3.0 pypi_0 pypi ncurses 6.4.20240210 h59595ed_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge networkx 3.2.1 pypi_0 pypi ninja 1.11.1.1 pypi_0 pypi numpy 1.26.3 pypi_0 pypi openssl 3.2.1 hd590300_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge packaging 24.0 pyhd8ed1ab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge pillow 10.2.0 pypi_0 pypi pip 24.0 pyhd8ed1ab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge python 3.10.14 hd12c33a_0_cpython https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge pyyaml 6.0.1 pypi_0 pypi readline 8.2 h8228510_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge regex 2023.12.25 pypi_0 pypi requests 2.28.1 pypi_0 pypi safetensors 0.4.2 pypi_0 pypi setuptools 69.2.0 pyhd8ed1ab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge sympy 1.12 pypi_0 pypi tk 8.6.13 noxft_h4845f30_101 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge tokenizers 0.15.2 pypi_0 pypi torch 2.0.1+cu118 pypi_0 pypi torchaudio 2.0.2+cu118 pypi_0 pypi torchvision 0.15.2+cu118 pypi_0 pypi tqdm 4.66.2 pypi_0 pypi transformers 4.39.1 pypi_0 pypi triton 2.0.0 pypi_0 pypi typing-extensions 4.8.0 pypi_0 pypi tzdata 2024a h0c530f3_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge urllib3 1.26.13 pypi_0 pypi wheel 0.43.0 pyhd8ed1ab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge xz 5.2.6 h166bdaf_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
The installation steps are consistent with the above
Same issue. Installed through pip wheel on CentOS Stream 9. Driver 550.54, cuda 12.4.
That might be a Pascal architecture issue. I am encountering the same issue with GTX 1070Ti, and in #170 P100 won't run either but OK with T4. It might be something related to the triton dependency which has some limitation in GPU architecture.
if you are in windows environment,you can refer this blog link I also meet the same problem as you,but following this blog i resolved this issue.
I have same problem,my nvidia-card is Quadro P620