oumi icon indicating copy to clipboard operation
oumi copied to clipboard

[Feature] Qwen VL 2.5 Evaluation via LM-Harness

Open optas opened this issue 1 year ago • 0 comments

Feature request

Incorporate the evaluation of the Qwen 2.5 Vision-Language model in Oumi via LM-Harness.

E.g., so to report MMMU scores on all subsets.

Motivation / references

  • Implementing this feature will allow us to test a cutting-edge VL model on standard benchmarks like MMMU.

  • The original related release of LM-Harness tested up to Qwen 2.0 VL models.

  • To help you jumpstart, please feel free to see or continue working from the branch optas/qwen_vl_2.5_eval. It already works for many subsets of MMMU.

  • Since oumi v0.1.5 training and inference with Qwen-2.5-VL is supported.

Your contribution

Code-review, pair-programming

OPE-1082

optas avatar Feb 20 '25 04:02 optas