oumi
oumi copied to clipboard
[Feature] Qwen VL 2.5 Evaluation via LM-Harness
Feature request
Incorporate the evaluation of the Qwen 2.5 Vision-Language model in Oumi via LM-Harness.
E.g., so to report MMMU scores on all subsets.
Motivation / references
-
Implementing this feature will allow us to test a cutting-edge VL model on standard benchmarks like MMMU.
-
The original related release of LM-Harness tested up to Qwen 2.0 VL models.
-
To help you jumpstart, please feel free to see or continue working from the branch optas/qwen_vl_2.5_eval. It already works for many subsets of MMMU.
-
Since oumi v0.1.5 training and inference with Qwen-2.5-VL is supported.
Your contribution
Code-review, pair-programming
OPE-1082