Video-LLaMA finetune-billa7b-zh inference error shape '[-1, 136]' is invalid for input of size 137

finetune-billa7b-zh inference error shape '[-1, 136]' is invalid for input of size 137

Open len2618187 opened this issue 1 year ago • 0 comments

trafficstars

Hi , Thank you very much for your great work I encountered some problems while using finetune-billa7b-zh model for inference. The configuration is as follows：

model:
arch: video_llama
    model_type: pretrain_vicuna
    freeze_vit: True
    freeze_qformer: True
    max_txt_len: 512
    end_sym: "###"
    low_resource: False
    frozen_llama_proj: False
    q_former_model: "pretrain_model/q_former_model/blip2_pretrained_flant5xxl.pth"
    vit_model: "pretrain_model/vit_model/eva_vit_g.pth"
    llama_model: "pretrain_model/[BiLLa-7B-SFT](https://huggingface.co/Neutralzz/BiLLa-7B-SFT)"
    ckpt: "pretrain_model/video_llama_zh/finetune-billa7b-zh.pth"
    equip_audio_branch: False
    fusion_head_layers: 2
    max_frame_pos: 32
    fusion_header_type: "seqTransf"


datasets:
  webvid:
    vis_processor:
      train:
        name: "alpro_video_eval"
        n_frms: 8
        image_size: 224
    text_processor:
      train:
        name: "blip_caption"
run:
  task: video_text_pretrain

Then I got error:

File "Video-LLaMA/video_llama/models/modeling_llama.py", line 517, in forward
        position_ids = position_ids.view(-1, seq_length).long()
RuntimeError: shape '[-1, 136]' is invalid for input of size 137

Can you tell me where I went wrong with my configuration? Thanks again

May 16 '24 08:05 len2618187

Video-LLaMA Video-LLaMA copied to clipboard

finetune-billa7b-zh inference error shape '[-1, 136]' is invalid for input of size 137

Video-LLaMA
Video-LLaMA copied to clipboard