vllm
vllm copied to clipboard
[Installation]: Build from source and flash-attention (0.7.2)
Your current environment
isolated system, can't provide environment.
How you are installing vllm
pip install .
Is install from source using VLLM_FLASH_ATTN_SRC_DIR still supported? I don't see it documented in the https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html#build-wheel-from-source? Does the vLLM build build flash-attention or just copy stuff over. There seem to be an issue with a missing target for "_vllm_fa2_C" in the build/temp* directory.
If not, is there any trick to build flash-attention separately and incorporate it into an installation of vllm built from scratch.
(I have an isolated machine with old CUDA that I'm trying to build a version of vLLM for)
Before submitting a new issue...
- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.