[Feature]: Official ROCm Binary to Speed Up vLLM Installation

Open Bihan opened this issue 1 year ago • 0 comments

🚀 The feature, motivation and pitch

The current installation process for vLLM on AMD devices presents significant challenges in terms of installation time:

The ROCm Docker image is exceptionally large (22GB compressed), leading to download times exceeding 15 minutes.
Building vLLM from source takes more than 10 minutes to complete due to time consumed in compilation.

In contrast, for CUDA users, a pip installation is available, streamlining the process significantly.

To improve the installation experience for ROCm users, I propose introducing an official pre-built binary that can be distributed via pip.

Alternatives

Currently, to speed up deployments, we have created our own binary for ROCm with vLLM, which reduces the entire setup process to about 2-3 minutes. It would be really helpful to have an official pre-built binary for ROCm to streamline this process.

Additional context

No response

Before submitting a new issue...

[x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Sep 10 '24 14:09 Bihan