LLaVA icon indicating copy to clipboard operation
LLaVA copied to clipboard

what is the purpose of delay_load for vision tower?

Open Zeqiang-Lai opened this issue 1 year ago • 3 comments

Describe the issue

https://github.com/haotian-liu/LLaVA/blob/7775b12d6b20cd69089be7a18ea02615a59621cd/llava/model/llava_arch.py#L33

Btw, I am also confused about why we can add token when loading pretrained models?

https://github.com/haotian-liu/LLaVA/blob/7775b12d6b20cd69089be7a18ea02615a59621cd/llava/model/builder.py#L131

Will random embedding affect the model?

Zeqiang-Lai avatar Dec 08 '23 12:12 Zeqiang-Lai

I'm confused too. Obviously, there are some vision_tower weights in llava-v1.6.

image

~If loading llm first with LlavaLlamaForCausalLM.from_pretrained and then load vision_tower, the vision_tower weights in llava-v1.6 will be overwritten by openai/clip-vit-large-patch14-336~

The above is wrong, when unfreeze_mm_vision_tower is True, the vision_tower will not delay load.

irexyc avatar Apr 10 '24 03:04 irexyc

me too.

xylcbd avatar Apr 17 '24 02:04 xylcbd

@xylcbd I looked the code again and find there is no problems in the code.

For llava-1.5, the vision_tower weight is same as openai, so it is ok whether delay load or not.

For llava-1.6, there is a parameter called unfreeze_mm_vision_tower in config.json and when the parameter is True, the vision_tower model will not delay load.

irexyc avatar Apr 22 '24 07:04 irexyc