vllm icon indicating copy to clipboard operation
vllm copied to clipboard

Publish wheels with pre-built CUDA binaries

Open WoosukKwon opened this issue 1 year ago • 2 comments

Currently, pip installing our package takes 5-10 minutes because our CUDA kernels are compiled on the user machine. For better UX, we should include pre-built CUDA binaries in our PyPI distribution, just like PyTorch and xformers.

WoosukKwon avatar Jun 05 '23 00:06 WoosukKwon

Yes, agree! I encounter below compatibility issue when trying to install vllm. RuntimeError: GPUs with compute capability less than 7.0 are not supported.

Details can be found:

Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [15 lines of output]
      Traceback (most recent call last):
        File "/home/pai/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/home/pai/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/home/pai/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/tmp/pip-build-env-fa8_sj_e/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 341, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=['wheel'])
        File "/tmp/pip-build-env-fa8_sj_e/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 323, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-fa8_sj_e/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 338, in run_setup
          exec(code, locals())
        File "<string>", line 48, in <module>
      RuntimeError: GPUs with compute capability less than 7.0 are not supported.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

vandesa003 avatar Jun 21 '23 10:06 vandesa003

This would be super helpful. We are in a peotected environment (thanks, IT!) Where we can only install cuda via conda. Conda cuda does not come with cuda.h because of nvidia licensing terms, so vllm installation fails. Having pre built wheel would allow the library to be used for everyone who installs cuda via conda (e.g. having two different version of cuda in two virtual environments)

andreapiso avatar Jun 22 '23 03:06 andreapiso

@andreapiso In the latest release and henceforth, vLLM is published with pre-built CUDA binaries. Please try out pip install vllm and let us know if it does not work for you.

WoosukKwon avatar Aug 25 '23 04:08 WoosukKwon