LLaVA
LLaVA copied to clipboard
what is the purpose of delay_load for vision tower?
Describe the issue
https://github.com/haotian-liu/LLaVA/blob/7775b12d6b20cd69089be7a18ea02615a59621cd/llava/model/llava_arch.py#L33
Btw, I am also confused about why we can add token when loading pretrained models?
https://github.com/haotian-liu/LLaVA/blob/7775b12d6b20cd69089be7a18ea02615a59621cd/llava/model/builder.py#L131
Will random embedding affect the model?
I'm confused too. Obviously, there are some vision_tower weights in llava-v1.6.
~If loading llm first with LlavaLlamaForCausalLM.from_pretrained
and then load vision_tower, the vision_tower weights in llava-v1.6 will be overwritten by openai/clip-vit-large-patch14-336
~
The above is wrong, when unfreeze_mm_vision_tower
is True, the vision_tower will not delay load.
me too.
@xylcbd I looked the code again and find there is no problems in the code.
For llava-1.5, the vision_tower weight is same as openai, so it is ok whether delay load or not.
For llava-1.6, there is a parameter called unfreeze_mm_vision_tower
in config.json and when the parameter is True, the vision_tower model will not delay load.