vllm
vllm copied to clipboard
[Installation]: Build from source and flash-attention (0.7.2)
Your current environment
isolated system, can't provide environment.
How you are installing vllm
pip install .
Is install from source using VLLM_FLASH_ATTN_SRC_DIR still supported? I don't see it documented in the https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html#build-wheel-from-source? Does the vLLM build build flash-attention or just copy stuff over. There seem to be an issue with a missing target for "_vllm_fa2_C" in the build/temp* directory.
If not, is there any trick to build flash-attention separately and incorporate it into an installation of vllm built from scratch.
(I have an isolated machine with old CUDA that I'm trying to build a version of vLLM for)
Before submitting a new issue...
- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
It doesn't appear to be documented in https://docs.vllm.ai/en/latest/serving/env_vars.html
But it is present in https://github.com/vllm-project/vllm/blob/main/CMakeLists.txt
Thanks. I'd been trying to use it (and having problems) so I wanted to make sure it was still "valid".
Turns out my problem was really that I was building against the latest flash-attention tag (2.6.2) as opposed to against the specific flash-attention hash in vllm's CMakeLists.txt. For whatever reason configuration was not generating _vllm_fa2_C targets for the build step.
Thanks. I'd been trying to use it (and having problems) so I wanted to make sure it was still "valid".
Turns out my problem was really that I was building against the latest flash-attention tag (2.6.2) as opposed to against the specific flash-attention hash in vllm's CMakeLists.txt. For whatever reason configuration was not generating _vllm_fa2_C targets for the build step.
Hello, I met the same problem. How did you solve this?
When I did it (~0.7.2) you needed to look in vllm/CMakeLists.txt and find what version of flash-attention was going to be downloaded and then just grab that by hand. Looking a the repo today is looks like stuff have moved around and now that information is in vllm/cmake/external_projects/vllm_flash_attn.cmake. Search for VLLM_FLASH_ATTN_SRC_DIR to get in the neighborhood to find the git tag that you should use.
It looks like the same information is there, but I have not tried building since 0.7.2 so your mileage may vary.