vllm Ubuntu pip installation issue

On ubuntu 20.04, Python 3.10, pip 23.1.2 Issue persisting with Python 3.8 and pip 21.2.4

Collecting vllm
  Using cached vllm-0.1.0.tar.gz (83 kB)
  Running command pip subprocess to install build dependencies
  Collecting ninja
    Using cached ninja-1.11.1-py2.py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (145 kB)
  Collecting packaging
    Using cached packaging-23.1-py3-none-any.whl (48 kB)
  Collecting setuptools
    Using cached setuptools-68.0.0-py3-none-any.whl (804 kB)
  Collecting torch>=2.0.0
    Using cached torch-2.0.1-cp310-cp310-manylinux1_x86_64.whl (619.9 MB)
  Collecting wheel
    Using cached wheel-0.40.0-py3-none-any.whl (64 kB)
  Collecting filelock (from torch>=2.0.0)
    Using cached filelock-3.12.2-py3-none-any.whl (10 kB)
  Collecting typing-extensions (from torch>=2.0.0)
    Using cached typing_extensions-4.6.3-py3-none-any.whl (31 kB)
  Collecting sympy (from torch>=2.0.0)
    Using cached sympy-1.12-py3-none-any.whl (5.7 MB)
  Collecting networkx (from torch>=2.0.0)
    Using cached networkx-3.1-py3-none-any.whl (2.1 MB)
  Collecting jinja2 (from torch>=2.0.0)
    Using cached Jinja2-3.1.2-py3-none-any.whl (133 kB)
  Collecting nvidia-cuda-nvrtc-cu11==11.7.99 (from torch>=2.0.0)
    Using cached nvidia_cuda_nvrtc_cu11-11.7.99-2-py3-none-manylinux1_x86_64.whl (21.0 MB)
  Collecting nvidia-cuda-runtime-cu11==11.7.99 (from torch>=2.0.0)
    Using cached nvidia_cuda_runtime_cu11-11.7.99-py3-none-manylinux1_x86_64.whl (849 kB)
  Collecting nvidia-cuda-cupti-cu11==11.7.101 (from torch>=2.0.0)
    Using cached nvidia_cuda_cupti_cu11-11.7.101-py3-none-manylinux1_x86_64.whl (11.8 MB)
  Collecting nvidia-cudnn-cu11==8.5.0.96 (from torch>=2.0.0)
    Using cached nvidia_cudnn_cu11-8.5.0.96-2-py3-none-manylinux1_x86_64.whl (557.1 MB)
  Collecting nvidia-cublas-cu11==11.10.3.66 (from torch>=2.0.0)
    Using cached nvidia_cublas_cu11-11.10.3.66-py3-none-manylinux1_x86_64.whl (317.1 MB)
  Collecting nvidia-cufft-cu11==10.9.0.58 (from torch>=2.0.0)
    Using cached nvidia_cufft_cu11-10.9.0.58-py3-none-manylinux1_x86_64.whl (168.4 MB)
  Collecting nvidia-curand-cu11==10.2.10.91 (from torch>=2.0.0)
    Using cached nvidia_curand_cu11-10.2.10.91-py3-none-manylinux1_x86_64.whl (54.6 MB)
  Collecting nvidia-cusolver-cu11==11.4.0.1 (from torch>=2.0.0)
    Using cached nvidia_cusolver_cu11-11.4.0.1-2-py3-none-manylinux1_x86_64.whl (102.6 MB)
  Collecting nvidia-cusparse-cu11==11.7.4.91 (from torch>=2.0.0)
    Using cached nvidia_cusparse_cu11-11.7.4.91-py3-none-manylinux1_x86_64.whl (173.2 MB)
  Collecting nvidia-nccl-cu11==2.14.3 (from torch>=2.0.0)
    Using cached nvidia_nccl_cu11-2.14.3-py3-none-manylinux1_x86_64.whl (177.1 MB)
  Collecting nvidia-nvtx-cu11==11.7.91 (from torch>=2.0.0)
    Using cached nvidia_nvtx_cu11-11.7.91-py3-none-manylinux1_x86_64.whl (98 kB)
  Collecting triton==2.0.0 (from torch>=2.0.0)
    Using cached triton-2.0.0-1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (63.3 MB)
  Collecting cmake (from triton==2.0.0->torch>=2.0.0)
    Using cached cmake-3.26.4-py2.py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (24.0 MB)
  Collecting lit (from triton==2.0.0->torch>=2.0.0)
    Using cached lit-16.0.6-py3-none-any.whl
  Collecting MarkupSafe>=2.0 (from jinja2->torch>=2.0.0)
    Using cached MarkupSafe-2.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB)
  Collecting mpmath>=0.19 (from sympy->torch>=2.0.0)
    Using cached mpmath-1.3.0-py3-none-any.whl (536 kB)
  Installing collected packages: ninja, mpmath, lit, cmake, wheel, typing-extensions, sympy, setuptools, packaging, nvidia-nccl-cu11, nvidia-cufft-cu11, nvidia-cuda-nvrtc-cu11, networkx, MarkupSafe, filelock, nvidia-nvtx-cu11, nvidia-cusparse-cu11, nvidia-curand-cu11, nvidia-cuda-runtime-cu11, nvidia-cuda-cupti-cu11, nvidia-cublas-cu11, jinja2, nvidia-cusolver-cu11, nvidia-cudnn-cu11, triton, torch
  ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
  llama-index 0.6.29 requires typing-extensions==4.5.0, but you have typing-extensions 4.6.3 which is incompatible.
  Successfully installed MarkupSafe-2.1.3 cmake-3.26.4 filelock-3.12.2 jinja2-3.1.2 lit-16.0.6 mpmath-1.3.0 networkx-3.1 ninja-1.11.1 nvidia-cublas-cu11-11.10.3.66 nvidia-cuda-cupti-cu11-11.7.101 nvidia-cuda-nvrtc-cu11-11.7.99 nvidia-cuda-runtime-cu11-11.7.99 nvidia-cudnn-cu11-8.5.0.96 nvidia-cufft-cu11-10.9.0.58 nvidia-curand-cu11-10.2.10.91 nvidia-cusolver-cu11-11.4.0.1 nvidia-cusparse-cu11-11.7.4.91 nvidia-nccl-cu11-2.14.3 nvidia-nvtx-cu11-11.7.91 packaging-23.1 setuptools-68.0.0 sympy-1.12 torch-2.0.1 triton-2.0.0 typing-extensions-4.6.3 wheel-0.40.0
  Installing build dependencies ... done
  Running command Getting requirements to build wheel
  Traceback (most recent call last):
    File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
      main()
    File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
      return hook(config_settings)
    File "/tmp/pip-build-env-z55_4n6u/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 341, in get_requires_for_build_wheel
      return self._get_build_requires(config_settings, requirements=['wheel'])
    File "/tmp/pip-build-env-z55_4n6u/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 323, in _get_build_requires
      self.run_setup()
    File "/tmp/pip-build-env-z55_4n6u/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 338, in run_setup
      exec(code, locals())
    File "<string>", line 59, in <module>
    File "<string>", line 34, in get_nvcc_cuda_version
  TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /usr/bin/python3 /usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py get_requires_for_build_wheel /tmp/tmppduw77x8
  cwd: /tmp/pip-install-2f8m63zn/vllm_d217e01c274447f6bbe90e3dbc707e73
  Getting requirements to build wheel ... error
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with

Jun 21 '23 16:06 ElizabethCappon

same issue here

Jun 21 '23 19:06 sharlec

Same problem here. Using Ubuntu via WSL2, CUDA 11.7, RTX 2080Ti, Python 3.10.

Jun 21 '23 21:06 ZQ-Dev8

Same issue, have been trying to work around it for a while with no success (but I was going in wrong direction for most of the time I guess trying different PyTorch versions).

CUDA 12.1, Python 3.10.10, and Nvidia 3060 12GB.

Jun 21 '23 22:06 bashirsouid

@ElizabethCappon @sharlec @dcruiz01 @bashirsouid Thanks for reporting the bug. It seems the way vLLM parses the NVCC version does not work under some environments. While I'm investigating the issue, could you use the NVIDIA PyTorch docker image?

Run the following:

# Pull the Docker image with CUDA 11.8.
docker run --gpus all -it --rm --shm-size=8g nvcr.io/nvidia/pytorch:22.12-py3

Inside the docker, run:

pip uninstall torch
pip install vllm

Jun 21 '23 22:06 WoosukKwon

@WoosukKwon I just got it working, maybe my solution will help the others @ElizabethCappon @sharlec @bashirsouid.

My problem was specific to WSL. Turns out I did not have a full cudatoolkit installation (I only had what you get from conda/pip, which is the libraries required to run precompiled cuda code but NOT the full installation). After doing a full install of cudatoolkit 11.8 and updating my path, installing vllm via source worked without issue.

Jun 21 '23 22:06 ZQ-Dev8

Oh, silly me, I missed seeing in the docs that CUDA 12 wasn't supported yet 🤦 .

Will try out the other docker image tonight.

Thanks for the advice and great project!

Jun 22 '23 00:06 bashirsouid

I have a similar problem.

In your setup file, get_ NVCC_ Cuda_ Version uses the system's CUDA_ HOME environment variable. I attempted to install in the Conda virtual environment and found that the cuda and nvcc configured in the virtual environment could not be called correctly, which caused some issues. (The version in the system cannot be replaced due to some historical reasons)

for example

Where is nvcc

Nvcc:/home1/songjie/. anda/envs/textgen/bin/nvcc. profile/home1/songjie/. anda/envs/textgen/bin/nvcc/usr/local/cuda-11.6/bin/nvcc/usr/local/cuda-11.6/bin/nvcc. profile

And when I compile it automatically

FAILED:/tmp/tmp65yqajwk. build temp/csrc/cache_ Kernels. o

/Usr/local/cuda/bin/nvcc - I/tmp/pip build env eofn28ya/overlay/lib/python3.10/site packages/torch/include - I/tmp/pip build env eofn28ya/overlay/lib/python3.10/site packages/torch/include/torch/csrc/api/include - I/tmp/pip build env eofn28ya/overlay/lib/python3.10/site packages/torch/include/TH - I/tmp/pip build ld env eofn28ya/overlay/lib/python3.10/site packages/torch/include/THC-I/usr/local/cuda/i Include - I/home1/songjie/. conda/envs/textgen/include/python3.10- c - c/extdata/server/public_ Database/llm/vllm/csrc/cache_ Kernels.cu - o/tmp/tmp65yqajwk. build temp/csrc/cache_ Kernels. o - D__ CUDA_ NO_ HALF_ OperaTORS__- D__ CUDA_ NO_ HALF_ CONVERSIONs_- D__ CUDA_ NO_ BFLOAT16_ CONVERSIONs_- D__ CUDA_ NO_ HALF2_ OperaTORS__-- Expt relaxed constexpr -- compiler options' '' '- fPIC' '' - O2 std=c++17- D_ GLIBCXX_ USE_ CXX11_ ABI=0- gencode arch=compute_ 89, code=sm_ 89-- threads 8- DTORCH_ API_ INCLUDE_ Extension_ H '- DPYBIND11_ COMPILER_ TYPE="gcc" '- DPYBIND11 STDLIB="libstdcpp" '- DPYBIND11 BUILD_ ABI="cxxabi1011" '- DTORCH Extension_ NAME=cache_ Ops - D_ GLIBCXX_ USE_ CXX11_ ABI=0

Nvcc total: Unsupported gpu architecture 'compute_ 89 '

I tried calling nvcc using an absolute path in the setup file, which should solve this problem. I hope you can update the issue of referencing this environment variable

Jun 22 '23 04:06 lovivi

Solved it by uninstalling cuda (version 11.7) and reinstalling (cuda 11.8).

No issues with docker image. Good alternative.

I'm on standard Ubuntu btw, not WSL.

Thanks for the tips and tricks!

Jun 22 '23 18:06 ElizabethCappon

I'm getting this error after pulling the Docker image with CUDA 11.8 and installation of vLLM:

from vllm import LLM, SamplingParams Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.8/dist-packages/vllm/init.py", line 2, in from vllm.engine.async_llm_engine import AsyncLLMEngine File "/usr/local/lib/python3.8/dist-packages/vllm/engine/async_llm_engine.py", line 6, in from vllm.engine.llm_engine import LLMEngine File "/usr/local/lib/python3.8/dist-packages/vllm/engine/llm_engine.py", line 15, in from vllm.worker.worker import Worker File "/usr/local/lib/python3.8/dist-packages/vllm/worker/worker.py", line 8, in from vllm.model_executor import get_model, InputMetadata, set_random_seed File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/init.py", line 2, in from vllm.model_executor.model_loader import get_model File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/model_loader.py", line 9, in from vllm.model_executor.models import (GPT2LMHeadModel, GPTNeoXForCausalLM, File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/models/init.py", line 1, in from vllm.model_executor.models.gpt_neox import GPTNeoXForCausalLM File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/models/gpt_neox.py", line 29, in from vllm.model_executor.layers.activation import get_act_fn File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/layers/activation.py", line 5, in from vllm import activation_ops ImportError: /usr/local/lib/python3.8/dist-packages/vllm/activation_ops.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

Jun 24 '23 15:06 hxssgaa

@hxssgaa It's an ABI error. Could you run pip uninstall torch and then re-install vllm again?

Jun 24 '23 15:06 WoosukKwon

pip install vllm

Thanks, it resolved my issue

Jun 24 '23 15:06 hxssgaa

I encountered this issue and was able to resolve it by adding the location of NVCC to my path.

Jun 26 '23 21:06 jvoas655

@WoosukKwon @dcruiz01 I have the same problem.**and I run nvcc -version,can find cuda is cuda11.8,and I have installed ninja&packaging&setuptools&torch2.0.0&wheel I run it on jetson orin 64G,it's aarch64**Please help me

running build_py running build_ext No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-11.8' Traceback (most recent call last): File "/tmp/pip-build-env-1bv509_l/overlay/lib/python3.8/site-packages/setuptools/command/editable_wheel.py", line 155, in run self._create_wheel_file(bdist_wheel) File "/tmp/pip-build-env-1bv509_l/overlay/lib/python3.8/site-packages/setuptools/command/editable_wheel.py", line 344, in _create_wheel_file files, mapping = self._run_build_commands(dist_name, unpacked, lib, tmp) File "/tmp/pip-build-env-1bv509_l/overlay/lib/python3.8/site-packages/setuptools/command/editable_wheel.py", line 267, in _run_build_commands self._run_build_subcommands() File "/tmp/pip-build-env-1bv509_l/overlay/lib/python3.8/site-packages/setuptools/command/editable_wheel.py", line 294, in _run_build_subcommands self.run_command(name) File "/tmp/pip-build-env-1bv509_l/overlay/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command self.distribution.run_command(command) File "/tmp/pip-build-env-1bv509_l/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 1234, in run_command super().run_command(command) File "/tmp/pip-build-env-1bv509_l/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/tmp/pip-build-env-1bv509_l/overlay/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 84, in run _build_ext.run(self) File "/tmp/pip-build-env-1bv509_l/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run self.build_extensions() File "/tmp/pip-build-env-1bv509_l/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 499, in build_extensions _check_cuda_version(compiler_name, compiler_version) File "/tmp/pip-build-env-1bv509_l/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 383, in _check_cuda_version torch_cuda_version = packaging.version.parse(torch.version.cuda) File "/tmp/pip-build-env-1bv509_l/overlay/lib/python3.8/site-packages/pkg_resources/_vendor/packaging/version.py", line 52, in parse return Version(version) File "/tmp/pip-build-env-1bv509_l/overlay/lib/python3.8/site-packages/pkg_resources/_vendor/packaging/version.py", line 196, in init match = self._regex.search(version) TypeError: expected string or bytes-like object /tmp/pip-build-env-1bv509_l/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py:988: _DebuggingTips: Problem in editable installation. !!

Jul 03 '23 12:07 dongkuang

@WoosukKwon @dcruiz01 I have the same problem.**and I run nvcc -version,can find cuda is cuda11.8,and I have installed ninja&packaging&setuptools&torch2.0.0&wheel I run it on jetson orin 64G,it's aarch64**Please help me

running build_py running build_ext No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-11.8' Traceback (most recent call last): File "/tmp/pip-build-env-1bv509_l/overlay/lib/python3.8/site-packages/setuptools/command/editable_wheel.py", line 155, in run self._create_wheel_file(bdist_wheel) File "/tmp/pip-build-env-1bv509_l/overlay/lib/python3.8/site-packages/setuptools/command/editable_wheel.py", line 344, in _create_wheel_file files, mapping = self._run_build_commands(dist_name, unpacked, lib, tmp) File "/tmp/pip-build-env-1bv509_l/overlay/lib/python3.8/site-packages/setuptools/command/editable_wheel.py", line 267, in _run_build_commands self._run_build_subcommands() File "/tmp/pip-build-env-1bv509_l/overlay/lib/python3.8/site-packages/setuptools/command/editable_wheel.py", line 294, in _run_build_subcommands self.run_command(name) File "/tmp/pip-build-env-1bv509_l/overlay/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command self.distribution.run_command(command) File "/tmp/pip-build-env-1bv509_l/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 1234, in run_command super().run_command(command) File "/tmp/pip-build-env-1bv509_l/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/tmp/pip-build-env-1bv509_l/overlay/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 84, in run _build_ext.run(self) File "/tmp/pip-build-env-1bv509_l/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run self.build_extensions() File "/tmp/pip-build-env-1bv509_l/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 499, in build_extensions _check_cuda_version(compiler_name, compiler_version) File "/tmp/pip-build-env-1bv509_l/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 383, in _check_cuda_version torch_cuda_version = packaging.version.parse(torch.version.cuda) File "/tmp/pip-build-env-1bv509_l/overlay/lib/python3.8/site-packages/pkg_resources/_vendor/packaging/version.py", line 52, in parse return Version(version) File "/tmp/pip-build-env-1bv509_l/overlay/lib/python3.8/site-packages/pkg_resources/_vendor/packaging/version.py", line 196, in init match = self._regex.search(version) TypeError: expected string or bytes-like object /tmp/pip-build-env-1bv509_l/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py:988: _DebuggingTips: Problem in editable installation. !!

I have the same problem on Windows 11. NVCC --version says cuda 11.8.

Jul 12 '23 22:07 jfjensen

Closing this as it appears to be solved.

Mar 06 '24 11:03 hmellor