Fix Qwen Omni when use audio in video
- Set default nframe=None to help Qwen Omni use it origin video understanding utils.
- Add message type = audio to support separate video and audio input for QwenOmni
- Unified self.use_audio_in_video for convenient control
- Fix the bug in existing code that do not pass audio info into the processor.
Hi @Mercury7353. Thank you for your contribution to our codebase, but there is still one problem I want to ask:
For Set default nframe=None to help Qwen Omni use it origin video understanding utils., our codebase will using the nframe setting in video dataset and make changes to nframe setting defined in qwen model. Unless also give nframe as None in video dataset config, or it will sample frames according to your setting in video dataset config.
So , if we want to use the original video process setting in qwen-omni, it's better only input the video data_path (without nframe and fps) into the model, but it's conflict with our setting, so we can not do that.
Besides, what's your command of replicating WorldSense score in Qwen2.5-Omni? I want to have a try on it.
Yes. I have reproduced the Qwen-Omni score on worldsense the code. It is 45.5
the command is : python run.py --data WorldSense_32frame --model Qwen2.5-Omni-7B But I set nframe to None in the model config:
"Qwen2.5-Omni-7B": partial(
Qwen2VLChat,
model_path="Qwen/Qwen2.5-Omni-7B",
min_pixels=1280 * 28 * 28,
max_pixels=16384 * 28 * 28,
use_custom_prompt=False,
use_audio_in_video=True, # set use audio in video
nframe=None, #disable nframe
),
你好,我在图文数据集上评测的性能差官方很多,请问这是为什么呢