VLMEvalKit
VLMEvalKit copied to clipboard
[Benchmark] Support MMReason
This PR adds evaluation support for the MMReason benchmark, accepted at ICCV 2025, to assess the reasoning capabilities of MLLMs.
Before submitting, I have tested that the code successfully run on Qwen2.5-VL and Qwen3-VL. The command to run the evaluation is as follows:
python3 run.py --data MMReason_testmini --model Qwen2.5-VL-7B-Instruct --verbose
python3 run.py --data MMReason_testmini --model Qwen3-VL-8B-Instruct --verbose