VLMEvalKit icon indicating copy to clipboard operation
VLMEvalKit copied to clipboard

Fix Qwen Omni when use audio in video

Open Mercury7353 opened this issue 8 months ago • 3 comments

  1. Set default nframe=None to help Qwen Omni use it origin video understanding utils.
  2. Add message type = audio to support separate video and audio input for QwenOmni
  3. Unified self.use_audio_in_video for convenient control
  4. Fix the bug in existing code that do not pass audio info into the processor.

Mercury7353 avatar Apr 28 '25 04:04 Mercury7353

Hi @Mercury7353. Thank you for your contribution to our codebase, but there is still one problem I want to ask: For Set default nframe=None to help Qwen Omni use it origin video understanding utils., our codebase will using the nframe setting in video dataset and make changes to nframe setting defined in qwen model. Unless also give nframe as None in video dataset config, or it will sample frames according to your setting in video dataset config. So , if we want to use the original video process setting in qwen-omni, it's better only input the video data_path (without nframe and fps) into the model, but it's conflict with our setting, so we can not do that.

Besides, what's your command of replicating WorldSense score in Qwen2.5-Omni? I want to have a try on it.

FangXinyu-0913 avatar Apr 28 '25 13:04 FangXinyu-0913

Yes. I have reproduced the Qwen-Omni score on worldsense the code. It is 45.5

Mercury7353 avatar Apr 30 '25 06:04 Mercury7353

the command is : python run.py --data WorldSense_32frame --model Qwen2.5-Omni-7B But I set nframe to None in the model config:

    "Qwen2.5-Omni-7B": partial(
        Qwen2VLChat,
        model_path="Qwen/Qwen2.5-Omni-7B",
        min_pixels=1280 * 28 * 28,
        max_pixels=16384 * 28 * 28,
        use_custom_prompt=False,
        use_audio_in_video=True, # set use audio in video
        nframe=None, #disable nframe
    ),

Mercury7353 avatar Apr 30 '25 06:04 Mercury7353

你好,我在图文数据集上评测的性能差官方很多,请问这是为什么呢 image

WenmuZhou avatar May 12 '25 09:05 WenmuZhou