Video-LLaVA
Video-LLaVA copied to clipboard
Error while loading finetuned model for inferencing.
I am able to pretrain and finetune (using lora) the videollava model using the scripts @ https://github.com/PKU-YuanGroup/Video-LLaVA/tree/main/scripts/v1_5 I used --model_name_or_path 'LanguageBind/Video-LLaVA-7B' for the both scripts- pretrain and finetune using lora.
When I try to run the finetuned model for inferencing; I am getting the following error:-
RuntimeError: Error(s) in loading state_dict for LlavaLlamaForCausalLM:
size mismatch for model.mm_projector.0.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([2097152, 1]).
size mismatch for model.mm_projector.2.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([8388608, 1]).
我也遇到了同样的问题,请问有人有解决方法了吗?感激 @tarunmis
I think with the default finetune_lora.sh
script, bf16
field is set to True, meaning the training is done in 16 bits by default. When loading the models for inference, setting load_4bit, load_8bit = False, False
will load the model in full before applying LoRA part on top of the base model successfully (at least for my case).