WSL Ubuntu installation issue
I get the following error when I try to install vllm on WSL Ubuntu. The main take away is -
Building wheel for vllm (pyproject.toml) did not run successfully.
Full Error Report -
❯ python3 -m pip install vllm
Collecting vllm
Using cached vllm-0.1.0.tar.gz (83 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Collecting ninja (from vllm)
Using cached ninja-1.11.1-py2.py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (145 kB)
Collecting psutil (from vllm)
Using cached psutil-5.9.5-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (282 kB)
Collecting ray (from vllm)
Using cached ray-2.5.0-cp38-cp38-manylinux2014_x86_64.whl (56.2 MB)
Collecting sentencepiece (from vllm)
Using cached sentencepiece-0.1.99-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
Collecting numpy (from vllm)
Using cached numpy-1.24.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
Collecting torch>=2.0.0 (from vllm)
Using cached torch-2.0.1-cp38-cp38-manylinux1_x86_64.whl (619.9 MB)
Collecting transformers>=4.28.0 (from vllm)
Using cached transformers-4.30.2-py3-none-any.whl (7.2 MB)
Collecting xformers>=0.0.19 (from vllm)
Using cached xformers-0.0.20-cp38-cp38-manylinux2014_x86_64.whl (109.1 MB)
Collecting fastapi (from vllm)
Using cached fastapi-0.97.0-py3-none-any.whl (56 kB)
Collecting uvicorn (from vllm)
Using cached uvicorn-0.22.0-py3-none-any.whl (58 kB)
Collecting pydantic (from vllm)
Using cached pydantic-1.10.9-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB)
Collecting filelock (from torch>=2.0.0->vllm)
Using cached filelock-3.12.2-py3-none-any.whl (10 kB)
Collecting typing-extensions (from torch>=2.0.0->vllm)
Using cached typing_extensions-4.6.3-py3-none-any.whl (31 kB)
Collecting sympy (from torch>=2.0.0->vllm)
Using cached sympy-1.12-py3-none-any.whl (5.7 MB)
Collecting networkx (from torch>=2.0.0->vllm)
Using cached networkx-3.1-py3-none-any.whl (2.1 MB)
Collecting jinja2 (from torch>=2.0.0->vllm)
Using cached Jinja2-3.1.2-py3-none-any.whl (133 kB)
Collecting nvidia-cuda-nvrtc-cu11==11.7.99 (from torch>=2.0.0->vllm)
Using cached nvidia_cuda_nvrtc_cu11-11.7.99-2-py3-none-manylinux1_x86_64.whl (21.0 MB)
Collecting nvidia-cuda-runtime-cu11==11.7.99 (from torch>=2.0.0->vllm)
Using cached nvidia_cuda_runtime_cu11-11.7.99-py3-none-manylinux1_x86_64.whl (849 kB)
Collecting nvidia-cuda-cupti-cu11==11.7.101 (from torch>=2.0.0->vllm)
Using cached nvidia_cuda_cupti_cu11-11.7.101-py3-none-manylinux1_x86_64.whl (11.8 MB)
Collecting nvidia-cudnn-cu11==8.5.0.96 (from torch>=2.0.0->vllm)
Using cached nvidia_cudnn_cu11-8.5.0.96-2-py3-none-manylinux1_x86_64.whl (557.1 MB)
Collecting nvidia-cublas-cu11==11.10.3.66 (from torch>=2.0.0->vllm)
Using cached nvidia_cublas_cu11-11.10.3.66-py3-none-manylinux1_x86_64.whl (317.1 MB)
Collecting nvidia-cufft-cu11==10.9.0.58 (from torch>=2.0.0->vllm)
Using cached nvidia_cufft_cu11-10.9.0.58-py3-none-manylinux1_x86_64.whl (168.4 MB)
Collecting nvidia-curand-cu11==10.2.10.91 (from torch>=2.0.0->vllm)
Using cached nvidia_curand_cu11-10.2.10.91-py3-none-manylinux1_x86_64.whl (54.6 MB)
Collecting nvidia-cusolver-cu11==11.4.0.1 (from torch>=2.0.0->vllm)
Using cached nvidia_cusolver_cu11-11.4.0.1-2-py3-none-manylinux1_x86_64.whl (102.6 MB)
Collecting nvidia-cusparse-cu11==11.7.4.91 (from torch>=2.0.0->vllm)
Using cached nvidia_cusparse_cu11-11.7.4.91-py3-none-manylinux1_x86_64.whl (173.2 MB)
Collecting nvidia-nccl-cu11==2.14.3 (from torch>=2.0.0->vllm)
Using cached nvidia_nccl_cu11-2.14.3-py3-none-manylinux1_x86_64.whl (177.1 MB)
Collecting nvidia-nvtx-cu11==11.7.91 (from torch>=2.0.0->vllm)
Using cached nvidia_nvtx_cu11-11.7.91-py3-none-manylinux1_x86_64.whl (98 kB)
Collecting triton==2.0.0 (from torch>=2.0.0->vllm)
Using cached triton-2.0.0-1-cp38-cp38-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (63.2 MB)
Requirement already satisfied: setuptools in ./vllm/lib/python3.8/site-packages (from nvidia-cublas-cu11==11.10.3.66->torch>=2.0.0->vllm) (68.0.0)
Requirement already satisfied: wheel in ./vllm/lib/python3.8/site-packages (from nvidia-cublas-cu11==11.10.3.66->torch>=2.0.0->vllm) (0.40.0)
Collecting cmake (from triton==2.0.0->torch>=2.0.0->vllm)
Using cached cmake-3.26.4-py2.py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (24.0 MB)
Collecting lit (from triton==2.0.0->torch>=2.0.0->vllm)
Using cached lit-16.0.6-py3-none-any.whl
Collecting huggingface-hub<1.0,>=0.14.1 (from transformers>=4.28.0->vllm)
Using cached huggingface_hub-0.15.1-py3-none-any.whl (236 kB)
Collecting packaging>=20.0 (from transformers>=4.28.0->vllm)
Using cached packaging-23.1-py3-none-any.whl (48 kB)
Collecting pyyaml>=5.1 (from transformers>=4.28.0->vllm)
Using cached PyYAML-6.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (701 kB)
Collecting regex!=2019.12.17 (from transformers>=4.28.0->vllm)
Using cached regex-2023.6.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (772 kB)
Collecting requests (from transformers>=4.28.0->vllm)
Using cached requests-2.31.0-py3-none-any.whl (62 kB)
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers>=4.28.0->vllm)
Using cached tokenizers-0.13.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
Collecting safetensors>=0.3.1 (from transformers>=4.28.0->vllm)
Using cached safetensors-0.3.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
Collecting tqdm>=4.27 (from transformers>=4.28.0->vllm)
Using cached tqdm-4.65.0-py3-none-any.whl (77 kB)
Collecting pyre-extensions==0.0.29 (from xformers>=0.0.19->vllm)
Using cached pyre_extensions-0.0.29-py3-none-any.whl (12 kB)
Collecting typing-inspect (from pyre-extensions==0.0.29->xformers>=0.0.19->vllm)
Using cached typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)
Collecting starlette<0.28.0,>=0.27.0 (from fastapi->vllm)
Using cached starlette-0.27.0-py3-none-any.whl (66 kB)
Collecting attrs (from ray->vllm)
Using cached attrs-23.1.0-py3-none-any.whl (61 kB)
Collecting click>=7.0 (from ray->vllm)
Using cached click-8.1.3-py3-none-any.whl (96 kB)
Collecting jsonschema (from ray->vllm)
Using cached jsonschema-4.17.3-py3-none-any.whl (90 kB)
Collecting msgpack<2.0.0,>=1.0.0 (from ray->vllm)
Using cached msgpack-1.0.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (322 kB)
Collecting protobuf!=3.19.5,>=3.15.3 (from ray->vllm)
Using cached protobuf-4.23.3-cp37-abi3-manylinux2014_x86_64.whl (304 kB)
Collecting aiosignal (from ray->vllm)
Using cached aiosignal-1.3.1-py3-none-any.whl (7.6 kB)
Collecting frozenlist (from ray->vllm)
Using cached frozenlist-1.3.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (161 kB)
Collecting grpcio<=1.51.3,>=1.32.0 (from ray->vllm)
Using cached grpcio-1.51.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.8 MB)
Collecting h11>=0.8 (from uvicorn->vllm)
Using cached h11-0.14.0-py3-none-any.whl (58 kB)
Collecting fsspec (from huggingface-hub<1.0,>=0.14.1->transformers>=4.28.0->vllm)
Using cached fsspec-2023.6.0-py3-none-any.whl (163 kB)
Collecting anyio<5,>=3.4.0 (from starlette<0.28.0,>=0.27.0->fastapi->vllm)
Using cached anyio-3.7.0-py3-none-any.whl (80 kB)
Collecting MarkupSafe>=2.0 (from jinja2->torch>=2.0.0->vllm)
Using cached MarkupSafe-2.1.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB)
Collecting importlib-resources>=1.4.0 (from jsonschema->ray->vllm)
Using cached importlib_resources-5.12.0-py3-none-any.whl (36 kB)
Collecting pkgutil-resolve-name>=1.3.10 (from jsonschema->ray->vllm)
Using cached pkgutil_resolve_name-1.3.10-py3-none-any.whl (4.7 kB)
Collecting pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 (from jsonschema->ray->vllm)
Using cached pyrsistent-0.19.3-py3-none-any.whl (57 kB)
Collecting charset-normalizer<4,>=2 (from requests->transformers>=4.28.0->vllm)
Using cached charset_normalizer-3.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (195 kB)
Collecting idna<4,>=2.5 (from requests->transformers>=4.28.0->vllm)
Using cached idna-3.4-py3-none-any.whl (61 kB)
Collecting urllib3<3,>=1.21.1 (from requests->transformers>=4.28.0->vllm)
Using cached urllib3-2.0.3-py3-none-any.whl (123 kB)
Collecting certifi>=2017.4.17 (from requests->transformers>=4.28.0->vllm)
Using cached certifi-2023.5.7-py3-none-any.whl (156 kB)
Collecting mpmath>=0.19 (from sympy->torch>=2.0.0->vllm)
Using cached mpmath-1.3.0-py3-none-any.whl (536 kB)
Collecting sniffio>=1.1 (from anyio<5,>=3.4.0->starlette<0.28.0,>=0.27.0->fastapi->vllm)
Using cached sniffio-1.3.0-py3-none-any.whl (10 kB)
Collecting exceptiongroup (from anyio<5,>=3.4.0->starlette<0.28.0,>=0.27.0->fastapi->vllm)
Using cached exceptiongroup-1.1.1-py3-none-any.whl (14 kB)
Collecting zipp>=3.1.0 (from importlib-resources>=1.4.0->jsonschema->ray->vllm)
Using cached zipp-3.15.0-py3-none-any.whl (6.8 kB)
Collecting mypy-extensions>=0.3.0 (from typing-inspect->pyre-extensions==0.0.29->xformers>=0.0.19->vllm)
Using cached mypy_extensions-1.0.0-py3-none-any.whl (4.7 kB)
Building wheels for collected packages: vllm
Building wheel for vllm (pyproject.toml) ... error
error: subprocess-exited-with-error
× Building wheel for vllm (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [179 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-cpython-38
creating build/lib.linux-x86_64-cpython-38/vllm
copying vllm/utils.py -> build/lib.linux-x86_64-cpython-38/vllm
copying vllm/sampling_params.py -> build/lib.linux-x86_64-cpython-38/vllm
copying vllm/outputs.py -> build/lib.linux-x86_64-cpython-38/vllm
copying vllm/logger.py -> build/lib.linux-x86_64-cpython-38/vllm
copying vllm/block.py -> build/lib.linux-x86_64-cpython-38/vllm
copying vllm/config.py -> build/lib.linux-x86_64-cpython-38/vllm
copying vllm/sequence.py -> build/lib.linux-x86_64-cpython-38/vllm
copying vllm/__init__.py -> build/lib.linux-x86_64-cpython-38/vllm
creating build/lib.linux-x86_64-cpython-38/vllm/model_executor
copying vllm/model_executor/weight_utils.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor
copying vllm/model_executor/utils.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor
copying vllm/model_executor/model_loader.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor
copying vllm/model_executor/input_metadata.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor
copying vllm/model_executor/__init__.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor
creating build/lib.linux-x86_64-cpython-38/vllm/worker
copying vllm/worker/worker.py -> build/lib.linux-x86_64-cpython-38/vllm/worker
copying vllm/worker/cache_engine.py -> build/lib.linux-x86_64-cpython-38/vllm/worker
copying vllm/worker/__init__.py -> build/lib.linux-x86_64-cpython-38/vllm/worker
creating build/lib.linux-x86_64-cpython-38/vllm/entrypoints
copying vllm/entrypoints/llm.py -> build/lib.linux-x86_64-cpython-38/vllm/entrypoints
copying vllm/entrypoints/api_server.py -> build/lib.linux-x86_64-cpython-38/vllm/entrypoints
copying vllm/entrypoints/__init__.py -> build/lib.linux-x86_64-cpython-38/vllm/entrypoints
creating build/lib.linux-x86_64-cpython-38/vllm/engine
copying vllm/engine/ray_utils.py -> build/lib.linux-x86_64-cpython-38/vllm/engine
copying vllm/engine/llm_engine.py -> build/lib.linux-x86_64-cpython-38/vllm/engine
copying vllm/engine/tokenizer_utils.py -> build/lib.linux-x86_64-cpython-38/vllm/engine
copying vllm/engine/async_llm_engine.py -> build/lib.linux-x86_64-cpython-38/vllm/engine
copying vllm/engine/__init__.py -> build/lib.linux-x86_64-cpython-38/vllm/engine
copying vllm/engine/arg_utils.py -> build/lib.linux-x86_64-cpython-38/vllm/engine
creating build/lib.linux-x86_64-cpython-38/vllm/core
copying vllm/core/policy.py -> build/lib.linux-x86_64-cpython-38/vllm/core
copying vllm/core/block_manager.py -> build/lib.linux-x86_64-cpython-38/vllm/core
copying vllm/core/scheduler.py -> build/lib.linux-x86_64-cpython-38/vllm/core
copying vllm/core/__init__.py -> build/lib.linux-x86_64-cpython-38/vllm/core
creating build/lib.linux-x86_64-cpython-38/vllm/model_executor/layers
copying vllm/model_executor/layers/sampler.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/layers
copying vllm/model_executor/layers/attention.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/layers
copying vllm/model_executor/layers/layernorm.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/layers
copying vllm/model_executor/layers/__init__.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/layers
copying vllm/model_executor/layers/activation.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/layers
creating build/lib.linux-x86_64-cpython-38/vllm/model_executor/parallel_utils
copying vllm/model_executor/parallel_utils/parallel_state.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/parallel_utils
copying vllm/model_executor/parallel_utils/__init__.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/parallel_utils
creating build/lib.linux-x86_64-cpython-38/vllm/model_executor/models
copying vllm/model_executor/models/gpt2.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/models
copying vllm/model_executor/models/opt.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/models
copying vllm/model_executor/models/gpt_neox.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/models
copying vllm/model_executor/models/__init__.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/models
copying vllm/model_executor/models/llama.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/models
creating build/lib.linux-x86_64-cpython-38/vllm/model_executor/parallel_utils/tensor_parallel
copying vllm/model_executor/parallel_utils/tensor_parallel/utils.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/parallel_utils/tensor_parallel
copying vllm/model_executor/parallel_utils/tensor_parallel/layers.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/parallel_utils/tensor_parallel
copying vllm/model_executor/parallel_utils/tensor_parallel/random.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/parallel_utils/tensor_parallel
copying vllm/model_executor/parallel_utils/tensor_parallel/mappings.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/parallel_utils/tensor_parallel
copying vllm/model_executor/parallel_utils/tensor_parallel/__init__.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/parallel_utils/tensor_parallel
creating build/lib.linux-x86_64-cpython-38/vllm/entrypoints/openai
copying vllm/entrypoints/openai/api_server.py -> build/lib.linux-x86_64-cpython-38/vllm/entrypoints/openai
copying vllm/entrypoints/openai/protocol.py -> build/lib.linux-x86_64-cpython-38/vllm/entrypoints/openai
copying vllm/entrypoints/openai/__init__.py -> build/lib.linux-x86_64-cpython-38/vllm/entrypoints/openai
running build_ext
building 'vllm.cache_ops' extension
creating /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/build/temp.linux-x86_64-cpython-38
creating /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/build/temp.linux-x86_64-cpython-38/csrc
/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py:388: UserWarning: The detected CUDA version (11.8) has a minor version mismatch with the version that was used to compile PyTorch (11.7). Most likely this shouldn't be a problem.
warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py:398: UserWarning: There are no x86_64-linux-gnu-g++ version bounds defined for CUDA version 11.8
warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
Emitting ninja build file /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/build/temp.linux-x86_64-cpython-38/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] c++ -MMD -MF /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/build/temp.linux-x86_64-cpython-38/csrc/cache.o.d -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/TH -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/eli/vllm-test/vllm/include -I/usr/include/python3.8 -c -c /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/csrc/cache.cpp -o /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/build/temp.linux-x86_64-cpython-38/csrc/cache.o -g -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=cache_ops -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/build/temp.linux-x86_64-cpython-38/csrc/cache.o
c++ -MMD -MF /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/build/temp.linux-x86_64-cpython-38/csrc/cache.o.d -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/TH -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/eli/vllm-test/vllm/include -I/usr/include/python3.8 -c -c /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/csrc/cache.cpp -o /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/build/temp.linux-x86_64-cpython-38/csrc/cache.o -g -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=cache_ops -D_GLIBCXX_USE_CXX11_ABI=0
In file included from /tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/csrc/Device.h:4,
from /tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/python.h:8,
from /tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/extension.h:6,
from /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/csrc/cache.cpp:1:
/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/csrc/python_headers.h:12:10: fatal error: Python.h: No such file or directory
12 | #include <Python.h>
| ^~~~~~~~~~
compilation terminated.
[2/2] /usr/local/cuda/bin/nvcc -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/TH -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/eli/vllm-test/vllm/include -I/usr/include/python3.8 -c -c /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/csrc/cache_kernels.cu -o /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/build/temp.linux-x86_64-cpython-38/csrc/cache_kernels.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -gencode arch=compute_86,code=sm_86 --threads 8 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=cache_ops -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/build/temp.linux-x86_64-cpython-38/csrc/cache_kernels.o
/usr/local/cuda/bin/nvcc -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/TH -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/eli/vllm-test/vllm/include -I/usr/include/python3.8 -c -c /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/csrc/cache_kernels.cu -o /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/build/temp.linux-x86_64-cpython-38/csrc/cache_kernels.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -gencode arch=compute_86,code=sm_86 --threads 8 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=cache_ops -D_GLIBCXX_USE_CXX11_ABI=0
In file included from /tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/csrc/Device.h:4,
from /tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/python.h:8,
from /tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/extension.h:6,
from /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/csrc/cache_kernels.cu:1:
/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/csrc/python_headers.h:12:10: fatal error: Python.h: No such file or directory
12 | #include <Python.h>
| ^~~~~~~~~~
compilation terminated.
In file included from /tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/csrc/Device.h:4,
from /tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/python.h:8,
from /tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/extension.h:6,
from /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/csrc/cache_kernels.cu:1:
/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/csrc/python_headers.h:12:10: fatal error: Python.h: No such file or directory
12 | #include <Python.h>
| ^~~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
subprocess.run(
File "/usr/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/eli/vllm-test/vllm/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
main()
File "/home/eli/vllm-test/vllm/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/home/eli/vllm-test/vllm/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
return _build_backend().build_wheel(wheel_directory, config_settings,
File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 416, in build_wheel
return self._build_with_temp_dir(['bdist_wheel'], '.whl',
File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 401, in _build_with_temp_dir
self.run_setup()
File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 338, in run_setup
exec(code, locals())
File "<string>", line 146, in <module>
File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/__init__.py", line 107, in setup
return distutils.core.setup(**attrs)
File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 1234, in run_command
super().run_command(command)
File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/wheel/bdist_wheel.py", line 343, in run
self.run_command("build")
File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 1234, in run_command
super().run_command(command)
File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build.py", line 131, in run
self.run_command(cmd_name)
File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 1234, in run_command
super().run_command(command)
File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 84, in run
_build_ext.run(self)
File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
self.build_extensions()
File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
build_ext.build_extensions(self)
File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
self._build_extensions_serial()
File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
self.build_extension(ext)
File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
_build_ext.build_extension(self, ext)
File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 548, in build_extension
objects = self.compiler.compile(
File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 658, in unix_wrap_ninja_compile
_write_ninja_file_and_compile_objects(
File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1574, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for vllm
Failed to build vllm
ERROR: Could not build wheels for vllm, which is required to install pyproject.toml-based projects
Similar problem here. Using Ubuntu via WSL2, CUDA 11.7, RTX 2080Ti, Python 3.10.
My problem was specific to WSL. Turns out I did not have a full cudatoolkit installation (I only had what you get from conda/pip, which is the libraries required to run precompiled cuda code but NOT the full installation). After doing a full install of cudatoolkit 11.8 and updating my path, installing vllm via source worked without issue.
You are right, installing cuda toolkit and setting the PATH variables does indeed resolve the issue. The following steps might be helpful for those stuck with the same problem -
python3 -m venv vllm
source vllm/bin/activate
python3 -m pip install --upgrade pip setuptools wheel
python3 -m ensurepip --upgrade
sudo apt-key del 7fa2af80
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda-repo-wsl-ubuntu-11-8-local_11.8.0-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-11-8-local_11.8.0-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-11-8-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda
nvcc --version
python3 -m pip install vllm
After following those steps nvcc --version still says its not installed