vllm WSL Ubuntu installation issue

I get the following error when I try to install vllm on WSL Ubuntu. The main take away is - Building wheel for vllm (pyproject.toml) did not run successfully. Full Error Report -

❯ python3 -m pip install vllm
Collecting vllm
  Using cached vllm-0.1.0.tar.gz (83 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting ninja (from vllm)
  Using cached ninja-1.11.1-py2.py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (145 kB)
Collecting psutil (from vllm)
  Using cached psutil-5.9.5-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (282 kB)
Collecting ray (from vllm)
  Using cached ray-2.5.0-cp38-cp38-manylinux2014_x86_64.whl (56.2 MB)
Collecting sentencepiece (from vllm)
  Using cached sentencepiece-0.1.99-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
Collecting numpy (from vllm)
  Using cached numpy-1.24.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
Collecting torch>=2.0.0 (from vllm)
  Using cached torch-2.0.1-cp38-cp38-manylinux1_x86_64.whl (619.9 MB)
Collecting transformers>=4.28.0 (from vllm)
  Using cached transformers-4.30.2-py3-none-any.whl (7.2 MB)
Collecting xformers>=0.0.19 (from vllm)
  Using cached xformers-0.0.20-cp38-cp38-manylinux2014_x86_64.whl (109.1 MB)
Collecting fastapi (from vllm)
  Using cached fastapi-0.97.0-py3-none-any.whl (56 kB)
Collecting uvicorn (from vllm)
  Using cached uvicorn-0.22.0-py3-none-any.whl (58 kB)
Collecting pydantic (from vllm)
  Using cached pydantic-1.10.9-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB)
Collecting filelock (from torch>=2.0.0->vllm)
  Using cached filelock-3.12.2-py3-none-any.whl (10 kB)
Collecting typing-extensions (from torch>=2.0.0->vllm)
  Using cached typing_extensions-4.6.3-py3-none-any.whl (31 kB)
Collecting sympy (from torch>=2.0.0->vllm)
  Using cached sympy-1.12-py3-none-any.whl (5.7 MB)
Collecting networkx (from torch>=2.0.0->vllm)
  Using cached networkx-3.1-py3-none-any.whl (2.1 MB)
Collecting jinja2 (from torch>=2.0.0->vllm)
  Using cached Jinja2-3.1.2-py3-none-any.whl (133 kB)
Collecting nvidia-cuda-nvrtc-cu11==11.7.99 (from torch>=2.0.0->vllm)
  Using cached nvidia_cuda_nvrtc_cu11-11.7.99-2-py3-none-manylinux1_x86_64.whl (21.0 MB)
Collecting nvidia-cuda-runtime-cu11==11.7.99 (from torch>=2.0.0->vllm)
  Using cached nvidia_cuda_runtime_cu11-11.7.99-py3-none-manylinux1_x86_64.whl (849 kB)
Collecting nvidia-cuda-cupti-cu11==11.7.101 (from torch>=2.0.0->vllm)
  Using cached nvidia_cuda_cupti_cu11-11.7.101-py3-none-manylinux1_x86_64.whl (11.8 MB)
Collecting nvidia-cudnn-cu11==8.5.0.96 (from torch>=2.0.0->vllm)
  Using cached nvidia_cudnn_cu11-8.5.0.96-2-py3-none-manylinux1_x86_64.whl (557.1 MB)
Collecting nvidia-cublas-cu11==11.10.3.66 (from torch>=2.0.0->vllm)
  Using cached nvidia_cublas_cu11-11.10.3.66-py3-none-manylinux1_x86_64.whl (317.1 MB)
Collecting nvidia-cufft-cu11==10.9.0.58 (from torch>=2.0.0->vllm)
  Using cached nvidia_cufft_cu11-10.9.0.58-py3-none-manylinux1_x86_64.whl (168.4 MB)
Collecting nvidia-curand-cu11==10.2.10.91 (from torch>=2.0.0->vllm)
  Using cached nvidia_curand_cu11-10.2.10.91-py3-none-manylinux1_x86_64.whl (54.6 MB)
Collecting nvidia-cusolver-cu11==11.4.0.1 (from torch>=2.0.0->vllm)
  Using cached nvidia_cusolver_cu11-11.4.0.1-2-py3-none-manylinux1_x86_64.whl (102.6 MB)
Collecting nvidia-cusparse-cu11==11.7.4.91 (from torch>=2.0.0->vllm)
  Using cached nvidia_cusparse_cu11-11.7.4.91-py3-none-manylinux1_x86_64.whl (173.2 MB)
Collecting nvidia-nccl-cu11==2.14.3 (from torch>=2.0.0->vllm)
  Using cached nvidia_nccl_cu11-2.14.3-py3-none-manylinux1_x86_64.whl (177.1 MB)
Collecting nvidia-nvtx-cu11==11.7.91 (from torch>=2.0.0->vllm)
  Using cached nvidia_nvtx_cu11-11.7.91-py3-none-manylinux1_x86_64.whl (98 kB)
Collecting triton==2.0.0 (from torch>=2.0.0->vllm)
  Using cached triton-2.0.0-1-cp38-cp38-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (63.2 MB)
Requirement already satisfied: setuptools in ./vllm/lib/python3.8/site-packages (from nvidia-cublas-cu11==11.10.3.66->torch>=2.0.0->vllm) (68.0.0)
Requirement already satisfied: wheel in ./vllm/lib/python3.8/site-packages (from nvidia-cublas-cu11==11.10.3.66->torch>=2.0.0->vllm) (0.40.0)
Collecting cmake (from triton==2.0.0->torch>=2.0.0->vllm)
  Using cached cmake-3.26.4-py2.py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (24.0 MB)
Collecting lit (from triton==2.0.0->torch>=2.0.0->vllm)
  Using cached lit-16.0.6-py3-none-any.whl
Collecting huggingface-hub<1.0,>=0.14.1 (from transformers>=4.28.0->vllm)
  Using cached huggingface_hub-0.15.1-py3-none-any.whl (236 kB)
Collecting packaging>=20.0 (from transformers>=4.28.0->vllm)
  Using cached packaging-23.1-py3-none-any.whl (48 kB)
Collecting pyyaml>=5.1 (from transformers>=4.28.0->vllm)
  Using cached PyYAML-6.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (701 kB)
Collecting regex!=2019.12.17 (from transformers>=4.28.0->vllm)
  Using cached regex-2023.6.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (772 kB)
Collecting requests (from transformers>=4.28.0->vllm)
  Using cached requests-2.31.0-py3-none-any.whl (62 kB)
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers>=4.28.0->vllm)
  Using cached tokenizers-0.13.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
Collecting safetensors>=0.3.1 (from transformers>=4.28.0->vllm)
  Using cached safetensors-0.3.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
Collecting tqdm>=4.27 (from transformers>=4.28.0->vllm)
  Using cached tqdm-4.65.0-py3-none-any.whl (77 kB)
Collecting pyre-extensions==0.0.29 (from xformers>=0.0.19->vllm)
  Using cached pyre_extensions-0.0.29-py3-none-any.whl (12 kB)
Collecting typing-inspect (from pyre-extensions==0.0.29->xformers>=0.0.19->vllm)
  Using cached typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)
Collecting starlette<0.28.0,>=0.27.0 (from fastapi->vllm)
  Using cached starlette-0.27.0-py3-none-any.whl (66 kB)
Collecting attrs (from ray->vllm)
  Using cached attrs-23.1.0-py3-none-any.whl (61 kB)
Collecting click>=7.0 (from ray->vllm)
  Using cached click-8.1.3-py3-none-any.whl (96 kB)
Collecting jsonschema (from ray->vllm)
  Using cached jsonschema-4.17.3-py3-none-any.whl (90 kB)
Collecting msgpack<2.0.0,>=1.0.0 (from ray->vllm)
  Using cached msgpack-1.0.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (322 kB)
Collecting protobuf!=3.19.5,>=3.15.3 (from ray->vllm)
  Using cached protobuf-4.23.3-cp37-abi3-manylinux2014_x86_64.whl (304 kB)
Collecting aiosignal (from ray->vllm)
  Using cached aiosignal-1.3.1-py3-none-any.whl (7.6 kB)
Collecting frozenlist (from ray->vllm)
  Using cached frozenlist-1.3.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (161 kB)
Collecting grpcio<=1.51.3,>=1.32.0 (from ray->vllm)
  Using cached grpcio-1.51.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.8 MB)
Collecting h11>=0.8 (from uvicorn->vllm)
  Using cached h11-0.14.0-py3-none-any.whl (58 kB)
Collecting fsspec (from huggingface-hub<1.0,>=0.14.1->transformers>=4.28.0->vllm)
  Using cached fsspec-2023.6.0-py3-none-any.whl (163 kB)
Collecting anyio<5,>=3.4.0 (from starlette<0.28.0,>=0.27.0->fastapi->vllm)
  Using cached anyio-3.7.0-py3-none-any.whl (80 kB)
Collecting MarkupSafe>=2.0 (from jinja2->torch>=2.0.0->vllm)
  Using cached MarkupSafe-2.1.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB)
Collecting importlib-resources>=1.4.0 (from jsonschema->ray->vllm)
  Using cached importlib_resources-5.12.0-py3-none-any.whl (36 kB)
Collecting pkgutil-resolve-name>=1.3.10 (from jsonschema->ray->vllm)
  Using cached pkgutil_resolve_name-1.3.10-py3-none-any.whl (4.7 kB)
Collecting pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 (from jsonschema->ray->vllm)
  Using cached pyrsistent-0.19.3-py3-none-any.whl (57 kB)
Collecting charset-normalizer<4,>=2 (from requests->transformers>=4.28.0->vllm)
  Using cached charset_normalizer-3.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (195 kB)
Collecting idna<4,>=2.5 (from requests->transformers>=4.28.0->vllm)
  Using cached idna-3.4-py3-none-any.whl (61 kB)
Collecting urllib3<3,>=1.21.1 (from requests->transformers>=4.28.0->vllm)
  Using cached urllib3-2.0.3-py3-none-any.whl (123 kB)
Collecting certifi>=2017.4.17 (from requests->transformers>=4.28.0->vllm)
  Using cached certifi-2023.5.7-py3-none-any.whl (156 kB)
Collecting mpmath>=0.19 (from sympy->torch>=2.0.0->vllm)
  Using cached mpmath-1.3.0-py3-none-any.whl (536 kB)
Collecting sniffio>=1.1 (from anyio<5,>=3.4.0->starlette<0.28.0,>=0.27.0->fastapi->vllm)
  Using cached sniffio-1.3.0-py3-none-any.whl (10 kB)
Collecting exceptiongroup (from anyio<5,>=3.4.0->starlette<0.28.0,>=0.27.0->fastapi->vllm)
  Using cached exceptiongroup-1.1.1-py3-none-any.whl (14 kB)
Collecting zipp>=3.1.0 (from importlib-resources>=1.4.0->jsonschema->ray->vllm)
  Using cached zipp-3.15.0-py3-none-any.whl (6.8 kB)
Collecting mypy-extensions>=0.3.0 (from typing-inspect->pyre-extensions==0.0.29->xformers>=0.0.19->vllm)
  Using cached mypy_extensions-1.0.0-py3-none-any.whl (4.7 kB)
Building wheels for collected packages: vllm
  Building wheel for vllm (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for vllm (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [179 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-cpython-38
      creating build/lib.linux-x86_64-cpython-38/vllm
      copying vllm/utils.py -> build/lib.linux-x86_64-cpython-38/vllm
      copying vllm/sampling_params.py -> build/lib.linux-x86_64-cpython-38/vllm
      copying vllm/outputs.py -> build/lib.linux-x86_64-cpython-38/vllm
      copying vllm/logger.py -> build/lib.linux-x86_64-cpython-38/vllm
      copying vllm/block.py -> build/lib.linux-x86_64-cpython-38/vllm
      copying vllm/config.py -> build/lib.linux-x86_64-cpython-38/vllm
      copying vllm/sequence.py -> build/lib.linux-x86_64-cpython-38/vllm
      copying vllm/__init__.py -> build/lib.linux-x86_64-cpython-38/vllm
      creating build/lib.linux-x86_64-cpython-38/vllm/model_executor
      copying vllm/model_executor/weight_utils.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor
      copying vllm/model_executor/utils.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor
      copying vllm/model_executor/model_loader.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor
      copying vllm/model_executor/input_metadata.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor
      copying vllm/model_executor/__init__.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor
      creating build/lib.linux-x86_64-cpython-38/vllm/worker
      copying vllm/worker/worker.py -> build/lib.linux-x86_64-cpython-38/vllm/worker
      copying vllm/worker/cache_engine.py -> build/lib.linux-x86_64-cpython-38/vllm/worker
      copying vllm/worker/__init__.py -> build/lib.linux-x86_64-cpython-38/vllm/worker
      creating build/lib.linux-x86_64-cpython-38/vllm/entrypoints
      copying vllm/entrypoints/llm.py -> build/lib.linux-x86_64-cpython-38/vllm/entrypoints
      copying vllm/entrypoints/api_server.py -> build/lib.linux-x86_64-cpython-38/vllm/entrypoints
      copying vllm/entrypoints/__init__.py -> build/lib.linux-x86_64-cpython-38/vllm/entrypoints
      creating build/lib.linux-x86_64-cpython-38/vllm/engine
      copying vllm/engine/ray_utils.py -> build/lib.linux-x86_64-cpython-38/vllm/engine
      copying vllm/engine/llm_engine.py -> build/lib.linux-x86_64-cpython-38/vllm/engine
      copying vllm/engine/tokenizer_utils.py -> build/lib.linux-x86_64-cpython-38/vllm/engine
      copying vllm/engine/async_llm_engine.py -> build/lib.linux-x86_64-cpython-38/vllm/engine
      copying vllm/engine/__init__.py -> build/lib.linux-x86_64-cpython-38/vllm/engine
      copying vllm/engine/arg_utils.py -> build/lib.linux-x86_64-cpython-38/vllm/engine
      creating build/lib.linux-x86_64-cpython-38/vllm/core
      copying vllm/core/policy.py -> build/lib.linux-x86_64-cpython-38/vllm/core
      copying vllm/core/block_manager.py -> build/lib.linux-x86_64-cpython-38/vllm/core
      copying vllm/core/scheduler.py -> build/lib.linux-x86_64-cpython-38/vllm/core
      copying vllm/core/__init__.py -> build/lib.linux-x86_64-cpython-38/vllm/core
      creating build/lib.linux-x86_64-cpython-38/vllm/model_executor/layers
      copying vllm/model_executor/layers/sampler.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/layers
      copying vllm/model_executor/layers/attention.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/layers
      copying vllm/model_executor/layers/layernorm.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/layers
      copying vllm/model_executor/layers/__init__.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/layers
      copying vllm/model_executor/layers/activation.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/layers
      creating build/lib.linux-x86_64-cpython-38/vllm/model_executor/parallel_utils
      copying vllm/model_executor/parallel_utils/parallel_state.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/parallel_utils
      copying vllm/model_executor/parallel_utils/__init__.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/parallel_utils
      creating build/lib.linux-x86_64-cpython-38/vllm/model_executor/models
      copying vllm/model_executor/models/gpt2.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/models
      copying vllm/model_executor/models/opt.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/models
      copying vllm/model_executor/models/gpt_neox.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/models
      copying vllm/model_executor/models/__init__.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/models
      copying vllm/model_executor/models/llama.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/models
      creating build/lib.linux-x86_64-cpython-38/vllm/model_executor/parallel_utils/tensor_parallel
      copying vllm/model_executor/parallel_utils/tensor_parallel/utils.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/parallel_utils/tensor_parallel
      copying vllm/model_executor/parallel_utils/tensor_parallel/layers.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/parallel_utils/tensor_parallel
      copying vllm/model_executor/parallel_utils/tensor_parallel/random.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/parallel_utils/tensor_parallel
      copying vllm/model_executor/parallel_utils/tensor_parallel/mappings.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/parallel_utils/tensor_parallel
      copying vllm/model_executor/parallel_utils/tensor_parallel/__init__.py -> build/lib.linux-x86_64-cpython-38/vllm/model_executor/parallel_utils/tensor_parallel
      creating build/lib.linux-x86_64-cpython-38/vllm/entrypoints/openai
      copying vllm/entrypoints/openai/api_server.py -> build/lib.linux-x86_64-cpython-38/vllm/entrypoints/openai
      copying vllm/entrypoints/openai/protocol.py -> build/lib.linux-x86_64-cpython-38/vllm/entrypoints/openai
      copying vllm/entrypoints/openai/__init__.py -> build/lib.linux-x86_64-cpython-38/vllm/entrypoints/openai
      running build_ext
      building 'vllm.cache_ops' extension
      creating /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/build/temp.linux-x86_64-cpython-38
      creating /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/build/temp.linux-x86_64-cpython-38/csrc
      /tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py:388: UserWarning: The detected CUDA version (11.8) has a minor version mismatch with the version that was used to compile PyTorch (11.7). Most likely this shouldn't be a problem.
        warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
      /tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py:398: UserWarning: There are no x86_64-linux-gnu-g++ version bounds defined for CUDA version 11.8
        warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
      Emitting ninja build file /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/build/temp.linux-x86_64-cpython-38/build.ninja...
      Compiling objects...
      Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
      [1/2] c++ -MMD -MF /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/build/temp.linux-x86_64-cpython-38/csrc/cache.o.d -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/TH -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/eli/vllm-test/vllm/include -I/usr/include/python3.8 -c -c /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/csrc/cache.cpp -o /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/build/temp.linux-x86_64-cpython-38/csrc/cache.o -g -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=cache_ops -D_GLIBCXX_USE_CXX11_ABI=0
      FAILED: /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/build/temp.linux-x86_64-cpython-38/csrc/cache.o
      c++ -MMD -MF /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/build/temp.linux-x86_64-cpython-38/csrc/cache.o.d -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/TH -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/eli/vllm-test/vllm/include -I/usr/include/python3.8 -c -c /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/csrc/cache.cpp -o /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/build/temp.linux-x86_64-cpython-38/csrc/cache.o -g -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=cache_ops -D_GLIBCXX_USE_CXX11_ABI=0
      In file included from /tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/csrc/Device.h:4,
                       from /tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/python.h:8,
                       from /tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/extension.h:6,
                       from /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/csrc/cache.cpp:1:
      /tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/csrc/python_headers.h:12:10: fatal error: Python.h: No such file or directory
         12 | #include <Python.h>
            |          ^~~~~~~~~~
      compilation terminated.
      [2/2] /usr/local/cuda/bin/nvcc  -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/TH -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/eli/vllm-test/vllm/include -I/usr/include/python3.8 -c -c /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/csrc/cache_kernels.cu -o /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/build/temp.linux-x86_64-cpython-38/csrc/cache_kernels.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -gencode arch=compute_86,code=sm_86 --threads 8 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=cache_ops -D_GLIBCXX_USE_CXX11_ABI=0
      FAILED: /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/build/temp.linux-x86_64-cpython-38/csrc/cache_kernels.o
      /usr/local/cuda/bin/nvcc  -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/TH -I/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/eli/vllm-test/vllm/include -I/usr/include/python3.8 -c -c /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/csrc/cache_kernels.cu -o /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/build/temp.linux-x86_64-cpython-38/csrc/cache_kernels.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -gencode arch=compute_86,code=sm_86 --threads 8 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=cache_ops -D_GLIBCXX_USE_CXX11_ABI=0
      In file included from /tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/csrc/Device.h:4,
                       from /tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/python.h:8,
                       from /tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/extension.h:6,
                       from /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/csrc/cache_kernels.cu:1:
      /tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/csrc/python_headers.h:12:10: fatal error: Python.h: No such file or directory
         12 | #include <Python.h>
            |          ^~~~~~~~~~
      compilation terminated.
      In file included from /tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/csrc/Device.h:4,
                       from /tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/python.h:8,
                       from /tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/extension.h:6,
                       from /tmp/pip-install-wqik0was/vllm_24d96c0d02fc4105b0152caf1947d751/csrc/cache_kernels.cu:1:
      /tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/include/torch/csrc/python_headers.h:12:10: fatal error: Python.h: No such file or directory
         12 | #include <Python.h>
            |          ^~~~~~~~~~
      compilation terminated.
      ninja: build stopped: subcommand failed.
      Traceback (most recent call last):
        File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
          subprocess.run(
        File "/usr/lib/python3.8/subprocess.py", line 516, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

      The above exception was the direct cause of the following exception:

      Traceback (most recent call last):
        File "/home/eli/vllm-test/vllm/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/home/eli/vllm-test/vllm/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/home/eli/vllm-test/vllm/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
          return _build_backend().build_wheel(wheel_directory, config_settings,
        File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 416, in build_wheel
          return self._build_with_temp_dir(['bdist_wheel'], '.whl',
        File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 401, in _build_with_temp_dir
          self.run_setup()
        File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 338, in run_setup
          exec(code, locals())
        File "<string>", line 146, in <module>
        File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/__init__.py", line 107, in setup
          return distutils.core.setup(**attrs)
        File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 185, in setup
          return run_commands(dist)
        File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
          dist.run_commands()
        File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
          self.run_command(cmd)
        File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 1234, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/wheel/bdist_wheel.py", line 343, in run
          self.run_command("build")
        File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 1234, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build.py", line 131, in run
          self.run_command(cmd_name)
        File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 1234, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 84, in run
          _build_ext.run(self)
        File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
          self.build_extensions()
        File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
          build_ext.build_extensions(self)
        File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
          self._build_extensions_serial()
        File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
          self.build_extension(ext)
        File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
          _build_ext.build_extension(self, ext)
        File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 548, in build_extension
          objects = self.compiler.compile(
        File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 658, in unix_wrap_ninja_compile
          _write_ninja_file_and_compile_objects(
        File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1574, in _write_ninja_file_and_compile_objects
          _run_ninja_build(
        File "/tmp/pip-build-env-9r7c0rnj/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
          raise RuntimeError(message) from e
      RuntimeError: Error compiling objects for extension
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for vllm
Failed to build vllm
ERROR: Could not build wheels for vllm, which is required to install pyproject.toml-based projects

Jun 21 '23 15:06 armsp

Similar problem here. Using Ubuntu via WSL2, CUDA 11.7, RTX 2080Ti, Python 3.10.

Jun 21 '23 21:06 ZQ-Dev8

My problem was specific to WSL. Turns out I did not have a full cudatoolkit installation (I only had what you get from conda/pip, which is the libraries required to run precompiled cuda code but NOT the full installation). After doing a full install of cudatoolkit 11.8 and updating my path, installing vllm via source worked without issue.

Jun 21 '23 22:06 ZQ-Dev8

You are right, installing cuda toolkit and setting the PATH variables does indeed resolve the issue. The following steps might be helpful for those stuck with the same problem -

python3 -m venv vllm
source vllm/bin/activate
python3 -m pip install --upgrade pip setuptools wheel
python3 -m ensurepip --upgrade
sudo apt-key del 7fa2af80
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda-repo-wsl-ubuntu-11-8-local_11.8.0-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-11-8-local_11.8.0-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-11-8-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda
nvcc --version
python3 -m pip install vllm

Jun 22 '23 15:06 armsp

After following those steps nvcc --version still says its not installed

Sep 25 '23 11:09 teknium1