What is the difference between sy1998/MLVU_dev and MLVU/MVLU?
Hello, thank you for sharing. I have a question. Why is the dataset given in this repository MLVU-Dev different from the dataset used by lmms-eval (sy1998/MLVU_dev)? Is there any difference between the two? Because of this difference, I took a lot of detours when using lmms-eval for evaluation.
The videos and test questions contained in these two repositories are consistent; the only difference lies in the data organization. The latter is a reorganized version intended for integration with the lmms-eval evaluation.
https://github.com/EvolvingLMMs-Lab/lmms-eval/blob/main/lmms_eval/tasks/mlvu/mlvu_dev.yaml Plz follow the dataset link in the yaml to download corresponding tasks if you want to use lmms-eval to evaluate. Thanks!
Thank you guys. I got it.
Should I extract all the files from MLVU_dev/video_part_1.zip to video_part_8.zip into the same folder?
https://github.com/EvolvingLMMs-Lab/lmms-eval/blob/main/lmms_eval/tasks/mlvu/utils.py yes, it is necessary. And you can print the video_path in this file to make sure it is ok.
👍🥰
I used lmms-eval to evaluate the models LLaVA_OneVision-qwen2-7b-ov and LLaVA-Video-7B-Qwen2 on MLVU_dev, but the results were 1% to 4% lower than those reported in the original paper. What could be the reason for this? Could it be due to changes in the data structure?
I used lmms-eval to evaluate the models LLaVA_OneVision-qwen2-7b-ov and LLaVA-Video-7B-Qwen2 on MLVU_dev, but the results were 1% to 4% lower than those reported in the original paper. What could be the reason for this? Could it be due to changes in the data structure? 我使用 lmms-eval 在 MLVU_dev 上评估了模型 LLaVA_OneVision-qwen2-7b-ov 和 LLaVA-Video-7B-Qwen2,但结果比原始论文中报告的低 1%到 4%。这是为什么?可能是由于数据结构的变化吗?
Me too, maybe trying more prompts would be better