[Bug] ValueError: Unsupported architecture: in configuration_internvl_chat.py (internvl 3.5 sft)

Open CrazyElements opened this issue 2 months ago • 0 comments

Checklist

[x] 1. I have searched related issues but cannot get the expected help.
[x] 2. The bug has not been fixed in the latest version.
[x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

Hi Internvl team, thank you for your excellent contributions to the multimodal LLM community! Some questions about fine-tuning InternVL 3.5 based on the script internvl3_5_14b_sft.sh:)

Why is freeze_backbone=False in InternVL 3.5?

Compared with InternVL 3.0, I noticed that in v3.5 the ViT backbone is not frozen (freeze_backbone=False). Could you please share the motivation behind continuing to train the ViT in this version? Or if i misunderstand something, please feel free to point me out.

ValueError: Unsupported architecture: in configuration_internvl_chat.py

When fine-tuning, I used transformers==4.53.2, accelerate==1.10.2. During initialization, an error occurs in configuration_internvl_chat.py

ValueError: Unsupported architecture:

After tracing the call stack, I found that llm_config=None (at the second call), even though the model’s config.json shows "architectures": ["Qwen3ForCausalLM"].

In the same file, you mentioned:

if llm_config is None:
    # TODO: There might still be a bug in transformers version 4.44 and above.
    llm_config = {'architectures': ['']}
    logger.info('llm_config is None. Initializing the LlamaConfig config with default values (`LlamaConfig`).')

I noticed that setting a default value to LlamaConfig resolves the bug. Could you please elaborate on:

Why this bug occurs in newer transformers versions? (why does it call it twice? the first is correct but the second wrong)
What impact does setting a default LlamaConfig have on model initialization and training behavior?

For context: I use transformers==4.53.2 because Qwen3’s config is only available in newer versions.

Could you share how you handled these issues during your own InternVL 3.5 training pipeline? Were you using a specific Transformers version? And finally, do you have plans to release the full InternVL 3.5 training code or configuration soon?

Thanks again for the incredible work and open-source contribution!

Reproduction

Following internvl3_5_14b_sft.sh

Environment

Following requirements.txt with 
`transformers==4.53.2, accelerate==1.10.2`.

Error traceback

Oct 05 '25 04:10 CrazyElements