Video-LLaVA icon indicating copy to clipboard operation
Video-LLaVA copied to clipboard

Error while loading finetuned model for inferencing.

Open tarunmis opened this issue 1 year ago • 2 comments

I am able to pretrain and finetune (using lora) the videollava model using the scripts @ https://github.com/PKU-YuanGroup/Video-LLaVA/tree/main/scripts/v1_5 I used --model_name_or_path 'LanguageBind/Video-LLaVA-7B' for the both scripts- pretrain and finetune using lora.

When I try to run the finetuned model for inferencing; I am getting the following error:-

RuntimeError: Error(s) in loading state_dict for LlavaLlamaForCausalLM:
        size mismatch for model.mm_projector.0.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([2097152, 1]).
        size mismatch for model.mm_projector.2.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([8388608, 1]).

tarunmis avatar Feb 25 '24 07:02 tarunmis

我也遇到了同样的问题,请问有人有解决方法了吗?感激 @tarunmis

zhangye0402 avatar Mar 02 '24 01:03 zhangye0402

I think with the default finetune_lora.sh script, bf16 field is set to True, meaning the training is done in 16 bits by default. When loading the models for inference, setting load_4bit, load_8bit = False, False will load the model in full before applying LoRA part on top of the base model successfully (at least for my case). image

henryyuanheng-wang avatar Mar 07 '24 17:03 henryyuanheng-wang