vllm [Installation]: Build from source and flash-attention (0.7.2)

[Installation]: Build from source and flash-attention (0.7.2)

Open hpcpony opened this issue 1 week ago • 1 comments

Your current environment

isolated system, can't provide environment.

How you are installing vllm

pip install .

Is install from source using VLLM_FLASH_ATTN_SRC_DIR still supported? I don't see it documented in the https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html#build-wheel-from-source? Does the vLLM build build flash-attention or just copy stuff over. There seem to be an issue with a missing target for "_vllm_fa2_C" in the build/temp* directory.

If not, is there any trick to build flash-attention separately and incorporate it into an installation of vllm built from scratch.

(I have an isolated machine with old CUDA that I'm trying to build a version of vLLM for)

Before submitting a new issue...

[x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Feb 16 '25 23:02 hpcpony

vllm vllm copied to clipboard

[Installation]: Build from source and flash-attention (0.7.2)

Your current environment

How you are installing vllm

Before submitting a new issue...

vllm
vllm copied to clipboard