Cannot load huggingface internvl3.5 with flash_attn

Open Shawn-Hwang opened this issue 3 months ago • 1 comments

Checklist

[x] 1. I have searched related issues but cannot get the expected help.
[x] 2. The bug has not been fixed in the latest version.
[x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

When I tried to load internvl3.5 using transformers:

import math
import torch
from transformers import AutoTokenizer, AutoModel

path = "OpenGVLab/InternVL3_5-8B"
model = AutoModel.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    use_flash_attn=True,
    trust_remote_code=True,
    device_map="auto").eval()

I got an error regarding the "use_flash_attn=True": TypeError: InternVLForConditionalGeneration.__init__() got an unexpected keyword argument 'use_flash_attn'

Reproduction

import math
import torch
from transformers import AutoTokenizer, AutoModel

path = "OpenGVLab/InternVL3_5-8B"
model = AutoModel.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    use_flash_attn=True,
    trust_remote_code=True,
    device_map="auto").eval()

Environment

I am not using lmdeploy. I am using transformers=4.56.2, flash_attn=2.8.3.

Error traceback

Sep 23 '25 17:09 Shawn-Hwang

This may be because you are using the hf version weights, which calls modeling_internvl.py from the transformers library. The function in the transformers library is maintained by the community and does not yet support the use_flash_attn parameter. If you need to use the use_flash_attn parameter, please use the custom version weights.

Sep 24 '25 13:09 WesKwong