Video-LLaMA
Video-LLaMA copied to clipboard
finetune-billa7b-zh inference error shape '[-1, 136]' is invalid for input of size 137
Hi , Thank you very much for your great work I encountered some problems while using finetune-billa7b-zh model for inference. The configuration is as follows:
model:
arch: video_llama
model_type: pretrain_vicuna
freeze_vit: True
freeze_qformer: True
max_txt_len: 512
end_sym: "###"
low_resource: False
frozen_llama_proj: False
q_former_model: "pretrain_model/q_former_model/blip2_pretrained_flant5xxl.pth"
vit_model: "pretrain_model/vit_model/eva_vit_g.pth"
llama_model: "pretrain_model/[BiLLa-7B-SFT](https://huggingface.co/Neutralzz/BiLLa-7B-SFT)"
ckpt: "pretrain_model/video_llama_zh/finetune-billa7b-zh.pth"
equip_audio_branch: False
fusion_head_layers: 2
max_frame_pos: 32
fusion_header_type: "seqTransf"
datasets:
webvid:
vis_processor:
train:
name: "alpro_video_eval"
n_frms: 8
image_size: 224
text_processor:
train:
name: "blip_caption"
run:
task: video_text_pretrain
Then I got error:
File "Video-LLaMA/video_llama/models/modeling_llama.py", line 517, in forward
position_ids = position_ids.view(-1, seq_length).long()
RuntimeError: shape '[-1, 136]' is invalid for input of size 137
Can you tell me where I went wrong with my configuration? Thanks again