LLaVA-NeXT
LLaVA-NeXT copied to clipboard
Confusion about the vision_tower parameter.
When training LLaVA_OneVision, why do I need to load vision_tower (siglip) as well as LLaVA_OneVision's own model parameters (lmms-lab/qwen2-0.5b-si)?
Could it be that the model parameters of LLaVA_OneVision itself (lmms-lab/qwen2-0.5b-si) do not contain all the parameter information?
i have this question too. i think lmms-lab/qwen2-0.5b-ov may only have llm weights, because when i tried to load vision tower from lmms-lab/qwen2-0.5b-ov, it failed. But paper saying, at one-vision stage, full param will be updated, so where did the updated vision tower weights go? it's wired.
i have this question too. i think
lmms-lab/qwen2-0.5b-ovmay only have llm weights, because when i tried to load vision tower fromlmms-lab/qwen2-0.5b-ov, it failed. But paper saying, at one-vision stage, full param will be updated, so where did the updated vision tower weights go? it's wired.
I have the same problem when using llava-next-interleave model.