MiniCPM-V
MiniCPM-V copied to clipboard
[BUG] flash_attn is needed but not specified in requirements
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
- [x] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
- [x] 我已经搜索过FAQ | I have searched FAQ
当前行为 | Current Behavior
When running the model using the requirements provided I get this error:
ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run pip install flash_attn`
期望行为 | Expected Behavior
No error - the requirements file should have listed flash attention, or the need for it should be documented somewhere.
期望能够提供取消对flash的绑定,或者提供一个安装教程,以及版本的指定。
复现方法 | Steps To Reproduce
- Install requirements listed in https://huggingface.co/openbmb/MiniCPM-o-2_6#usage
- Run the code in https://huggingface.co/openbmb/MiniCPM-o-2_6#model-initialization
运行环境 | Environment
- OS: Windows 10 WSL Ubuntu 24
- Python: 3.10
- Transformers: 4.44.2
- PyTorch: 2.3.1+cu121
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`): 12.1
备注 | Anything else?
Many others have gotten the same error: https://github.com/OpenBMB/MiniCPM-o/issues/429.
This can be fixed by adding flash attention to the requirements file. However, flash attention does not have prebuilt wheels in the pypi release, so a better fix would be to tell people to look up the relevant wheel in https://github.com/Dao-AILab/flash-attention/releases and pip install that wheel.
For example, I used https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.8/flash_attn-2.5.8+cu122torch2.3cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
To get it working I did this:
- Install python 3.10 in WSL (Linux would also work of course)
- Install from this first requirements file. Be prepare for a large download.
--index-url https://download.pytorch.org/whl/cu121
torch==2.3.1
torchaudio==2.3.1
torchvision==0.18.1
- Install from this second requirements file
# reqs adapated from https://huggingface.co/openbmb/MiniCPM-o-2_6
Pillow==10.1.0
transformers==4.44.2
librosa==0.9.0
soundfile==0.12.1
vector-quantize-pytorch==1.18.5
vocos==0.1.0
# Install from prebuilt wheel so user doesn't have to install CUDA toolkit & compile themselves
# https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu12torch2.3cxx11abiTRUE-cp310-cp310-linux_x86_64.whl
# above doesn't work ~ https://github.com/Dao-AILab/flash-attention/issues/975
https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.8/flash_attn-2.5.8+cu122torch2.3cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
# only needed if you work with videos
# decord==0.6.0
# moviepy==2.1.2
This fixes the issue, but I'm still leaving the issue open because the need for flash attention should be documented in the readme or huggingface page.
Sadly I went through the trouble of fixing it only to find out I don't have enough memory for the model 😂
Thanks! We will look into the possibility of removing the dependency on flash attention.
@YuzaChongyi even install flash attention, it still not work will use xformers https://github.com/vllm-project/vllm/issues/12656 Cannot use FlashAttention-2 backend for head size 72 So minicpm-v can not use flash attention?