Open-Sora caption_llava.py is not work

caption_llava.py is not work

Open necrophagists opened this issue 1 year ago • 2 comments

i change the llm to llava-v1.6-mistral-7b，but model's output is empty。

Mar 21 '24 11:03 necrophagists

So am I

Mar 21 '24 18:03 DwanZhang-AI

So am I

hi，i am already fix this problem. The code is right, the problem is the prompt['three frames'] is too long for mistral-7b (441 tokens). I changed with a shorter prompt ("Please describe the video with one paragraph")，than it works.

Mar 22 '24 01:03 necrophagists

can you give a example?

Mar 22 '24 08:03 ersanliqiao

@necrophagists

Mar 22 '24 08:03 ersanliqiao

@necrophagists

can you give a example?

try "Please describe the video"

Mar 22 '24 10:03 necrophagists

Should I use 34B llava model?

now I am using A100 40G

And edit prompt (tools > caption > utils.py)

    "video": {
        "text": "Describe this video and its style in a very detailed manner. Pay attention to all objects in the video. The description should be useful for AI to re-generate the video. The description should be no more than six sentences.",
        "type": "video",
    },

    "video": {
        "text": "Please describe the video",
        "type": "video",
    },

and i run this script

torchrun --nproc_per_node 1 --standalone -m tools.caption.caption_llava \
  /home/hed/Open-Sora/sample_data/sample_data_split/parts/meta_clips_info_fmin1_aes_part_0_aesmin3.0.csv \
  --model-path liuhaotian/llava-v1.6-vicuna-7b \
  --prompt video \
  --bs 8 \
  --tp-size 1 \
  --dp-size 1

result csv file like this /home/hed/Open-Sora/sample_data/sample_data_split/big_buck_bunny_240p_2mb_scene-1.mp4,,69 /home/hed/Open-Sora/sample_data/sample_data_split/big_buck_bunny_240p_1mb_scene-2.mp4,,10

Jul 24 '24 04:07 KihongK

Open-Sora Open-Sora copied to clipboard

caption_llava.py is not work

Open-Sora
Open-Sora copied to clipboard