InternLM-XComposer ValueError: Input image size (490*490) doesn't match model (336*336).

When I ran the example inference code for model xcomposer2-vl-7b provided in the huggingface page:

import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2-vl-7b', trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2-vl-7b', trust_remote_code=True)

query = '<ImageHere>Please describe this image in detail.'
image = 'Our image path'

with torch.cuda.amp.autocast():
  response, _ = model.chat(tokenizer, query=query, image=image, history=[], do_sample=False)
print(response)

I got an error: ValueError: Input image size (490*490) doesn't match model (336*336)

Dec 25 '24 09:12 ZTWHHH

I had the same problem, did you solve it?

Dec 30 '24 02:12 dle666

I had the same problem, did you solve it?

I haven't solved it. But XComposer-2.5 can work.

Jan 11 '25 19:01 ZTWHHH

Nah I'm struggling

On Sat, Jan 11, 2025, 2:16 PM Tianwei Zhao @.***> wrote:

I had the same problem, did you solve it?

I haven't solved it.

— Reply to this email directly, view it on GitHub https://github.com/InternLM/InternLM-XComposer/issues/461#issuecomment-2585380712, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJIXNWAANFN7OMC63WLD7ML2KFUXFAVCNFSM6AAAAABUFYCUUWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOBVGM4DANZRGI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Jan 11 '25 19:01 Moshindeiru

This is due to the excessive version of the transformer package. Downgrading it to version 4.40.0 allows the code to run normally. transformers包的版本太高了，降低到4.40.0版本，代码能正常运行

Jan 14 '25 02:01 stickydream

This is due to the excessive version of the transformer package. Downgrading it to version 4.40.0 allows the code to run normally. transformers包的版本太高了，降低到4.40.0版本，代码能正常运行

还是希望internLM能兼容高版本的transformers

Jun 23 '25 08:06 wwdok