vllm icon indicating copy to clipboard operation
vllm copied to clipboard

Build failure due to CUDA version mismatch

Open WoosukKwon opened this issue 2 years ago • 3 comments

I failed to build the system with the latest NVIDIA PyTorch docker image. The reason is PyTorch installed by pip is built with CUDA 11.7 while the container uses CUDA 12.1.

RuntimeError:
The detected CUDA version (12.1) mismatches the version that was used to compile
PyTorch (11.7). Please make sure to use the same CUDA versions.

WoosukKwon avatar May 26 '23 04:05 WoosukKwon

Same Issue Here. It looks like it did not use CUDA 11.8 in the conda environmnent. CUDA 11.8 Python 3.8.16 Nvidia A100 80G ubuntu

File "/tmp/pip-build-env-_5k66uxz/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 387, in _check_cuda_version raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda)) RuntimeError: The detected CUDA version (12.0) mismatches the version that was used to compile PyTorch (11.7). Please make sure to use the same CUDA versions.

(vllm) x@x:~/xx$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Sep_21_10:33:58_PDT_2022 Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0

Joejoequ avatar Jun 29 '23 03:06 Joejoequ

@Joejoequ Thanks for reporting it! I think in your case, the problem can be easily solved by installing CUDA 11.8 version of PyTorch:

pip3 install torch --index-url https://download.pytorch.org/whl/cu118

WoosukKwon avatar Jun 29 '23 06:06 WoosukKwon

@WoosukKwon Thanks for the quick response! But the issue still exists after using the command above. It is weird that the detected version is 12.0 since the cudatookit installed in the env is 11.8. Looks like it will access Cuda outside the env.

Joejoequ avatar Jun 29 '23 08:06 Joejoequ

@WoosukKwon Thanks for the quick response! But the issue still exists after using the command above. It is weird that the detected version is 12.0 since the cudatookit installed in the env is 11.8. Looks like it will access Cuda outside the env.

Finally i installed successfully by changing module env and use "module load" on linux server.

Joejoequ avatar Jul 06 '23 02:07 Joejoequ

I have the same problem. Pytorch using cuda 11.8 installed, yet I get this error when installing.

The detected CUDA version (12.1) mismatches the version that was used to compile
      PyTorch (11.8). Please make sure to use the same CUDA versions.

mosheduminer avatar Aug 04 '23 18:08 mosheduminer

im also having the same issue: nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Tue_Feb__7_19:32:13_PST_2023 Cuda compilation tools, release 12.1, V12.1.66 Build cuda_12.1.r12.1/compiler.32415258_0

and i have installed nightly version of pytorch with cuda 12.1 support

DavidPeleg6 avatar Aug 06 '23 12:08 DavidPeleg6

nvcr.io/nvidia/pytorch:22.12-py3 image is the last one with CUDA 11.8 according to compatibility matrix. Then images switched to CUDA 12+

antonpolishko avatar Aug 27 '23 09:08 antonpolishko

@WoosukKwon Thanks for the quick response! But the issue still exists after using the command above. It is weird that the detected version is 12.0 since the cudatookit installed in the env is 11.8. Looks like it will access Cuda outside the env.

Finally i installed successfully by changing module env and use "module load" on linux server.

@Joejoequ I got the same problem. Can you show me how to solve this?

Ikkyu321 avatar Nov 02 '23 08:11 Ikkyu321

I have the same problem. Pytorch using cuda 11.8 installed, yet I get this error when installing.

The detected CUDA version (12.1) mismatches the version that was used to compile
      PyTorch (11.8). Please make sure to use the same CUDA versions.

Option 1 : You have cuda 12.1 therefore you should simply uninstall the current binaries of pytorch that you have and then reinstall it using :

pip3 install torch torchvision torchaudio

Option 2 : If you do not want to use the cuda 12.1 that you have installed, you can use another version of cuda (11.7, 11.8, ...). First you need to uninstall your current cuda (https://stackoverflow.com/a/56827564) and then select the one you want on the cuda install website

valentin-fngr avatar Nov 02 '23 09:11 valentin-fngr

same problem: The detected CUDA version (12.3) mismatches the version that was used to compile PyTorch (11.7). Please make sure to use the same CUDA versions.

xunfeng1980 avatar Nov 04 '23 04:11 xunfeng1980

Possible to use cuda 11.7? there's other services that require 11.7

Kkkassini avatar Nov 07 '23 13:11 Kkkassini

I'm having the same problem. I've re-installed my pytorch to support cuda 11.8. Don't know why still shows this error

Lumingous avatar Nov 08 '23 13:11 Lumingous

Removing pyproject.toml may be a solution. In my case, the build system was using the version of pytorch from pyproject.toml rather than the pytorch already installed.

jaesuny avatar Nov 10 '23 04:11 jaesuny

Removing pyproject.toml may be a solution. In my case, the build system was using the version of pytorch from pyproject.toml rather than the pytorch already installed.

think,it work for me.

xunfeng1980 avatar Nov 11 '23 08:11 xunfeng1980

RuntimeError: The detected CUDA version (12.3) mismatches the version that was used to compile PyTorch (11.6). Please make sure to use the same CUDA versions.

StevenZ-G avatar Nov 19 '23 17:11 StevenZ-G

RuntimeError: The detected CUDA version (12.3) mismatches the version that was used to compile PyTorch (11.6). Please make sure to use the same CUDA versions.

I resolved this error by downgrading the version of vllm from 0.2.2 to 0.2.1

quanhephia avatar Nov 20 '23 11:11 quanhephia

@WoosukKwon Thanks for the quick response! But the issue still exists after using the command above. It is weird that the detected version is 12.0 since the cudatookit installed in the env is 11.8. Looks like it will access Cuda outside the env.

Finally i installed successfully by changing module env and use "module load" on linux server.

please elaborate.

Mruduldhawley avatar Dec 05 '23 11:12 Mruduldhawley

  1. Changed env to run with CUDA 11.7
  2. Install vllm with pip install vllm Without this step I face another error
  3. Then install from source with pip install -e .

This solved it for me

0-hero avatar Dec 10 '23 16:12 0-hero

For my problem, I found the the code used /usr/bin/nvcc which will print a different version The right nvcc is in /usr/local/cuda/bin, So I delete /usr/bin/nvcc, now my code works fine.

DuAooo avatar Dec 25 '23 15:12 DuAooo

Possible to use cuda 11.7? there's other services that require 11.7

Is it possible to use vllm in cuda 11.7 ? I tried pip install vllm in an conda environment, but still cannot install successfully.

4daJKong avatar Jan 12 '24 06:01 4daJKong

I solve this by running this: conda install nvidia/label/cuda-11.8.0::cuda-nvcc

looks like your conda env has no nvcc installed, and it calls your system-based nvcc, which is not 11.8 or the cuda version you installed.

DrAlexLiu avatar Jan 25 '24 15:01 DrAlexLiu

Removing pyproject.toml may be a solution. In my case, the build system was using the version of pytorch from pyproject.toml rather than the pytorch already installed.

Thanks! You are a true hero.

Daishijun avatar Jan 31 '24 11:01 Daishijun

@WoosukKwon is this resolved now?

hmellor avatar Apr 04 '24 08:04 hmellor

Closing because the build system has changed dramatically since this was opened

hmellor avatar Apr 20 '24 12:04 hmellor

Removing pyproject.toml may be a solution. In my case, the build system was using the version of pytorch from pyproject.toml rather than the pytorch already installed.

Removing pyproject.toml may be a solution. In my case, the build system was using the version of pytorch from pyproject.toml rather than the pytorch already installed.

how to remove pyproject.toml?

oJro avatar Jul 25 '24 06:07 oJro