MLVU icon indicating copy to clipboard operation
MLVU copied to clipboard

What is the difference between sy1998/MLVU_dev and MLVU/MVLU?

Open Cola-any opened this issue 8 months ago • 8 comments

Hello, thank you for sharing. I have a question. Why is the dataset given in this repository MLVU-Dev different from the dataset used by lmms-eval (sy1998/MLVU_dev)? Is there any difference between the two? Because of this difference, I took a lot of detours when using lmms-eval for evaluation.

Cola-any avatar Apr 16 '25 08:04 Cola-any

The videos and test questions contained in these two repositories are consistent; the only difference lies in the data organization. The latter is a reorganized version intended for integration with the lmms-eval evaluation.

JUNJIE99 avatar Apr 16 '25 12:04 JUNJIE99

https://github.com/EvolvingLMMs-Lab/lmms-eval/blob/main/lmms_eval/tasks/mlvu/mlvu_dev.yaml Plz follow the dataset link in the yaml to download corresponding tasks if you want to use lmms-eval to evaluate. Thanks!

shuyansy avatar Apr 16 '25 15:04 shuyansy

Thank you guys. I got it.

Cola-any avatar Apr 17 '25 01:04 Cola-any

Should I extract all the files from MLVU_dev/video_part_1.zip to video_part_8.zip into the same folder?

SplendidYuan avatar Apr 22 '25 07:04 SplendidYuan

https://github.com/EvolvingLMMs-Lab/lmms-eval/blob/main/lmms_eval/tasks/mlvu/utils.py yes, it is necessary. And you can print the video_path in this file to make sure it is ok.

shuyansy avatar Apr 22 '25 08:04 shuyansy

👍🥰

SplendidYuan avatar Apr 22 '25 12:04 SplendidYuan

I used lmms-eval to evaluate the models LLaVA_OneVision-qwen2-7b-ov and LLaVA-Video-7B-Qwen2 on MLVU_dev, but the results were 1% to 4% lower than those reported in the original paper. What could be the reason for this? Could it be due to changes in the data structure?

SplendidYuan avatar May 08 '25 08:05 SplendidYuan

I used lmms-eval to evaluate the models LLaVA_OneVision-qwen2-7b-ov and LLaVA-Video-7B-Qwen2 on MLVU_dev, but the results were 1% to 4% lower than those reported in the original paper. What could be the reason for this? Could it be due to changes in the data structure? 我使用 lmms-eval 在 MLVU_dev 上评估了模型 LLaVA_OneVision-qwen2-7b-ov 和 LLaVA-Video-7B-Qwen2,但结果比原始论文中报告的低 1%到 4%。这是为什么?可能是由于数据结构的变化吗?

Me too, maybe trying more prompts would be better

wh-xu1 avatar Sep 09 '25 12:09 wh-xu1