MiniCPM-V [BUG] flash_attn is needed but not specified in requirements

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

[x] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

[x] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

When running the model using the requirements provided I get this error:

ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run pip install flash_attn`

期望行为 | Expected Behavior

No error - the requirements file should have listed flash attention, or the need for it should be documented somewhere.

期望能够提供取消对flash的绑定，或者提供一个安装教程，以及版本的指定。

复现方法 | Steps To Reproduce

Install requirements listed in https://huggingface.co/openbmb/MiniCPM-o-2_6#usage
Run the code in https://huggingface.co/openbmb/MiniCPM-o-2_6#model-initialization

运行环境 | Environment

- OS: Windows 10 WSL Ubuntu 24
- Python: 3.10
- Transformers: 4.44.2
- PyTorch: 2.3.1+cu121
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`): 12.1

备注 | Anything else?

Many others have gotten the same error: https://github.com/OpenBMB/MiniCPM-o/issues/429.

This can be fixed by adding flash attention to the requirements file. However, flash attention does not have prebuilt wheels in the pypi release, so a better fix would be to tell people to look up the relevant wheel in https://github.com/Dao-AILab/flash-attention/releases and pip install that wheel.

For example, I used https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.8/flash_attn-2.5.8+cu122torch2.3cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

Jan 19 '25 01:01 Almenon

To get it working I did this:

Install python 3.10 in WSL (Linux would also work of course)
Install from this first requirements file. Be prepare for a large download.

--index-url https://download.pytorch.org/whl/cu121
torch==2.3.1
torchaudio==2.3.1
torchvision==0.18.1

Install from this second requirements file

# reqs adapated from https://huggingface.co/openbmb/MiniCPM-o-2_6
Pillow==10.1.0
transformers==4.44.2
librosa==0.9.0
soundfile==0.12.1
vector-quantize-pytorch==1.18.5
vocos==0.1.0

# Install from prebuilt wheel so user doesn't have to install CUDA toolkit & compile themselves
# https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu12torch2.3cxx11abiTRUE-cp310-cp310-linux_x86_64.whl
# above doesn't work ~ https://github.com/Dao-AILab/flash-attention/issues/975
https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.8/flash_attn-2.5.8+cu122torch2.3cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

# only needed if you work with videos
# decord==0.6.0
# moviepy==2.1.2

This fixes the issue, but I'm still leaving the issue open because the need for flash attention should be documented in the readme or huggingface page.

Sadly I went through the trouble of fixing it only to find out I don't have enough memory for the model 😂

Jan 19 '25 01:01 Almenon

Thanks! We will look into the possibility of removing the dependency on flash attention.

Jan 20 '25 05:01 YuzaChongyi

@YuzaChongyi even install flash attention, it still not work will use xformers https://github.com/vllm-project/vllm/issues/12656 Cannot use FlashAttention-2 backend for head size 72 So minicpm-v can not use flash attention?

Feb 03 '25 09:02 jasstionzyf

MiniCPM-V MiniCPM-V copied to clipboard

[BUG] flash_attn is needed but not specified in requirements

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

期望行为 | Expected Behavior

复现方法 | Steps To Reproduce

运行环境 | Environment

备注 | Anything else?

MiniCPM-V
MiniCPM-V copied to clipboard