MiniCPM-V [BUG] 2.6版本默认绑定flash_atten，无法取消，并且目前并没有提供对应flash

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

[X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

[X] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run `pip install flash_attn`

demo2.6会有这样的报错

期望行为 | Expected Behavior

期望能够提供取消对flash的绑定，或者提供一个安装教程，以及版本的指定。

复现方法 | Steps To Reproduce

2.6的demo例程。Linux系统。

运行环境 | Environment

Python: 3.10
Transformers: 4.40.0
PyTorch: 2.4.0+cu121
CUDA: 12.2

备注 | Anything else?

No response

Aug 08 '24 13:08 kaixindelele

I dont understand Chinese but I think we have a similar problem about the flash_attn

I use python 3.11

and install the libraries by order:

numpy==1.24.3 Pillow==10.1.0 torch==2.1.2 torchvision==0.16.2 transformers==4.40.0 sentencepiece==0.1.99 accelerate==0.30.1 bitsandbytes==0.43.1 flash_attn

Aug 09 '24 03:08 adlifuad-asc

同学，问题有解决方案吗，我也卡在这里了。我windows，cu117和torch2.1.0没找到对应的flash_atten的包

Aug 09 '24 05:08 HongLouyemeng

哈哈，我放弃Windows了，换了一台台式机一把成功： RTX3090显卡 Ubuntu20.04 Driver Version: 545.23.08
CUDA Version: 12.3 transformers 4.40.0 torch 2.4.0+cu124 torchaudio 2.4.0+cu124 flash-attn 2.6.3 flash安装一把过~~

Aug 09 '24 06:08 kaixindelele

哈哈，我放弃Windows了，换了一台台式机一把成功： RTX3090显卡 Ubuntu20.04 Driver Version: 545.23.08 CUDA Version: 12.3 transformers 4.40.0 torch 2.4.0+cu124 torchaudio 2.4.0+cu124 flash-attn 2.6.3 flash安装一把过~~

woc，太难了。我貌似没linux

Aug 09 '24 06:08 HongLouyemeng

用windows的whl文件和对应的cuda、cudnn、torch版本即可

Aug 09 '24 07:08 HongLouyemeng

尝试一下安装flash-attn==1.0.4，我这边可以

Aug 09 '24 12:08 Anionex

在mac下已经解决了，输出博客：https://bothsavage.github.io/article/240810-minicpm2.6

提交pr：https://github.com/OpenBMB/MiniCPM-V/pull/461

修改web_demo_2.6.py文件

# fix the imports
def fixed_get_imports(filename: Union[str, os.PathLike]) -> list[str]:
    imports = get_imports(filename)
    if not torch.cuda.is_available() and "flash_attn" in imports:
        imports.remove("flash_attn")
    return imports

。。。。。。。。。。

with patch("transformers.dynamic_module_utils.get_imports", fixed_get_imports):
    model = AutoModel.from_pretrained(model_path, trust_remote_code=True, torch_dtype=torch.dtype)
    model = model.to(device=device)

Aug 09 '24 17:08 BothSavage

Hi Solution for me:

removed some packages required versions from requirements.txt and dont forget to remove comment from flash_attn. add bitsandbytes spacy gradio torch torchvision bitsandbytes flash_attn
pip install --upgrade pip setuptools wheel # this will solve the torch <-> flash_attn wheel issue. pip install -r requirements.txt Install went fine.
pip install torch torchvision --upgrade # needed for me.

this shows: Successfully installed nvidia-cudnn-cu12-9.1.0.70 nvidia-nccl-cu12-2.20.5 torch-2.4.0 torchvision-0.19.0 triton-3.0.0

python web_demo_2.6.py --device cuda

Now webdemo is start to run but got CUDA out of memory. :) (i've 12GB VRAM RTX3060) but this is another problem.

Aug 10 '24 21:08 robertio

可以把flash atten包直接删了也可以运行，我看到有写其他备选运行方式

Aug 11 '24 07:08 xudawu201

I catched the import error, it may like /home/xxx/miniconda3/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE, ref https://github.com/Dao-AILab/flash-attention/issues/919

So I use the flash-attn==2.5.8 when I have torch==2.3.0.

Aug 13 '24 05:08 vagetablechicken

在mac下已经解决了，输出博客：https://bothsavage.github.io/article/240810-minicpm2.6

提交pr：#461

修改web_demo_2.6.py文件

# fix the imports
def fixed_get_imports(filename: Union[str, os.PathLike]) -> list[str]:
    imports = get_imports(filename)
    if not torch.cuda.is_available() and "flash_attn" in imports:
        imports.remove("flash_attn")
    return imports

。。。。。。。。。。

with patch("transformers.dynamic_module_utils.get_imports", fixed_get_imports):
    model = AutoModel.from_pretrained(model_path, trust_remote_code=True, torch_dtype=torch.dtype)
    model = model.to(device=device)

It's very useful , thx bro.

Aug 13 '24 06:08 teneous

[BUG] 2.6版本默认绑定flash_atten，无法取消，并且目前并没有提供对应flash_att的版本和安装示例。

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

期望行为 | Expected Behavior

复现方法 | Steps To Reproduce

运行环境 | Environment

备注 | Anything else?