LLaVA-NeXT Confusion about the vision

Confusion about the vision_tower parameter.

Open Davidwhw opened this issue 10 months ago • 2 comments

When training LLaVA_OneVision, why do I need to load vision_tower (siglip) as well as LLaVA_OneVision's own model parameters (lmms-lab/qwen2-0.5b-si)? Could it be that the model parameters of LLaVA_OneVision itself (lmms-lab/qwen2-0.5b-si) do not contain all the parameter information?

Jan 13 '25 06:01 Davidwhw

i have this question too. i think lmms-lab/qwen2-0.5b-ov may only have llm weights, because when i tried to load vision tower from lmms-lab/qwen2-0.5b-ov, it failed. But paper saying, at one-vision stage, full param will be updated, so where did the updated vision tower weights go? it's wired.

Jan 14 '25 05:01 zyandtom

i have this question too. i think lmms-lab/qwen2-0.5b-ov may only have llm weights, because when i tried to load vision tower from lmms-lab/qwen2-0.5b-ov, it failed. But paper saying, at one-vision stage, full param will be updated, so where did the updated vision tower weights go? it's wired.

I have the same problem when using llava-next-interleave model.

Jan 15 '25 08:01 alcholiclg

LLaVA-NeXT LLaVA-NeXT copied to clipboard

Confusion about the vision_tower parameter.

LLaVA-NeXT
LLaVA-NeXT copied to clipboard