Video-LLaMA
Video-LLaMA copied to clipboard
Error(s) in loading state_dict for VideoLLAMA
Hi, I am running the demo with only VL branch, I set the checkpoint path like:
llama_model: "model_weights/vicuna_final/"
ckpt: '/home/ubuntu/Documents/Video-LLaMA/model_weights/Pre-trained_Visual_Encoder/pretrained_minigpt4.pth' # you can use our pretrained ckpt from https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-2-13B-Pretrained/
equip_audio_branch: False
and the weight files are organised like: https://i.postimg.cc/CKCFK2ZD/files.png
When I run the command:
python demo_video.py \
--cfg-path eval_configs/video_llama_eval_only_vl.yaml \
--model_type vicuna \
--gpu-id 0
It raises an error:
Initializing Chat
Loading VIT
Loading VIT Done
Loading Q-Former
Using pad_token, but it is not set yet.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:06<00:00, 3.13s/it]
Load first Checkpoint: /home/ubuntu/Documents/Video-LLaMA/model_weights/Pre-trained_Visual_Encoder/pretrained_minigpt4.pth
Traceback (most recent call last):
File "/home/ubuntu/Documents/Video-LLaMA/demo_video.py", line 67, in <module>
model = model_cls.from_config(model_config).to('cuda:{}'.format(args.gpu_id))
File "/home/ubuntu/Documents/Video-LLaMA/video_llama/models/video_llama.py", line 608, in from_config
msg = model.load_state_dict(ckpt['model'], strict=False)
File "/home/ubuntu/anaconda3/envs/videollama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1604, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for VideoLLAMA:
size mismatch for llama_proj.weight: copying a param with shape torch.Size([5120, 768]) from checkpoint, the shape in current model is torch.Size([4096, 768]).
size mismatch for llama_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
So what is the problem? Thank you a lot!