vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[Installation]: Build from source and flash-attention (0.7.2)

Open hpcpony opened this issue 9 months ago • 1 comments

Your current environment

isolated system, can't provide environment.

How you are installing vllm

pip install .

Is install from source using VLLM_FLASH_ATTN_SRC_DIR still supported? I don't see it documented in the https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html#build-wheel-from-source? Does the vLLM build build flash-attention or just copy stuff over. There seem to be an issue with a missing target for "_vllm_fa2_C" in the build/temp* directory.

If not, is there any trick to build flash-attention separately and incorporate it into an installation of vllm built from scratch.

(I have an isolated machine with old CUDA that I'm trying to build a version of vLLM for)

Before submitting a new issue...

  • [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

hpcpony avatar Feb 16 '25 23:02 hpcpony

It doesn't appear to be documented in https://docs.vllm.ai/en/latest/serving/env_vars.html

But it is present in https://github.com/vllm-project/vllm/blob/main/CMakeLists.txt

hmellor avatar Feb 17 '25 12:02 hmellor

Thanks. I'd been trying to use it (and having problems) so I wanted to make sure it was still "valid".

Turns out my problem was really that I was building against the latest flash-attention tag (2.6.2) as opposed to against the specific flash-attention hash in vllm's CMakeLists.txt. For whatever reason configuration was not generating _vllm_fa2_C targets for the build step.

hpcpony avatar Feb 17 '25 19:02 hpcpony

Thanks. I'd been trying to use it (and having problems) so I wanted to make sure it was still "valid".

Turns out my problem was really that I was building against the latest flash-attention tag (2.6.2) as opposed to against the specific flash-attention hash in vllm's CMakeLists.txt. For whatever reason configuration was not generating _vllm_fa2_C targets for the build step.

Hello, I met the same problem. How did you solve this?

Cppowboy avatar Mar 03 '25 13:03 Cppowboy

When I did it (~0.7.2) you needed to look in vllm/CMakeLists.txt and find what version of flash-attention was going to be downloaded and then just grab that by hand. Looking a the repo today is looks like stuff have moved around and now that information is in vllm/cmake/external_projects/vllm_flash_attn.cmake. Search for VLLM_FLASH_ATTN_SRC_DIR to get in the neighborhood to find the git tag that you should use.

It looks like the same information is there, but I have not tried building since 0.7.2 so your mileage may vary.

hpcpony avatar Mar 03 '25 21:03 hpcpony