Video-LLaMA Error(s) in loading state

Error(s) in loading state_dict for VideoLLAMA

Open tiesanguaixia opened this issue 9 months ago • 4 comments

Hi, I am running the demo with only VL branch, I set the checkpoint path like:

llama_model: "model_weights/vicuna_final/"
ckpt: '/home/ubuntu/Documents/Video-LLaMA/model_weights/Pre-trained_Visual_Encoder/pretrained_minigpt4.pth'   # you can use our pretrained ckpt from https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-2-13B-Pretrained/
equip_audio_branch: False

and the weight files are organised like: https://i.postimg.cc/CKCFK2ZD/files.png

When I run the command:

python demo_video.py \
    --cfg-path eval_configs/video_llama_eval_only_vl.yaml \
    --model_type vicuna \
    --gpu-id 0

It raises an error:

Initializing Chat
Loading VIT
Loading VIT Done
Loading Q-Former
Using pad_token, but it is not set yet.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:06<00:00,  3.13s/it]
Load first Checkpoint: /home/ubuntu/Documents/Video-LLaMA/model_weights/Pre-trained_Visual_Encoder/pretrained_minigpt4.pth
Traceback (most recent call last):
  File "/home/ubuntu/Documents/Video-LLaMA/demo_video.py", line 67, in <module>
    model = model_cls.from_config(model_config).to('cuda:{}'.format(args.gpu_id))
  File "/home/ubuntu/Documents/Video-LLaMA/video_llama/models/video_llama.py", line 608, in from_config
    msg = model.load_state_dict(ckpt['model'], strict=False)
  File "/home/ubuntu/anaconda3/envs/videollama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1604, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for VideoLLAMA:
        size mismatch for llama_proj.weight: copying a param with shape torch.Size([5120, 768]) from checkpoint, the shape in current model is torch.Size([4096, 768]).
        size mismatch for llama_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).

So what is the problem? Thank you a lot!

Sep 15 '23 16:09 tiesanguaixia

Video-LLaMA Video-LLaMA copied to clipboard

Error(s) in loading state_dict for VideoLLAMA

Video-LLaMA
Video-LLaMA copied to clipboard