[Feature] Qwen VL 2.5 Evaluation via LM-Harness

Open optas opened this issue 1 year ago • 0 comments

Incorporate the evaluation of the Qwen 2.5 Vision-Language model in Oumi via LM-Harness.

E.g., so to report MMMU scores on all subsets.

Implementing this feature will allow us to test a cutting-edge VL model on standard benchmarks like MMMU.
The original related release of LM-Harness tested up to Qwen 2.0 VL models.
To help you jumpstart, please feel free to see or continue working from the branch optas/qwen_vl_2.5_eval. It already works for many subsets of MMMU.
Since oumi v0.1.5 training and inference with Qwen-2.5-VL is supported.

Code-review, pair-programming

OPE-1082

Feb 20 '25 04:02 optas