vllm
vllm copied to clipboard
[Installation]: flash-attention internal "git submodule update" problematic for offline-install
Your current environment
N/A
How you are installing vllm
pip install .
I was building vllm off-line with clones of CUTLASS and flash-attention. flash-attention (setup.py) does a "git submodule update" to populate the CUTLASS include files it needs. This is problematic for an off-line install. It would be better if it payed attention to VLLM_CUTLASS_SRC_DIR or something like that.
A simple work around is to just copy the CUTLASS include tree into flash-attention csrc/cutlass subdirectory before building vllm.
Before submitting a new issue...
- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.