mlmm-evaluation icon indicating copy to clipboard operation
mlmm-evaluation copied to clipboard

Multilingual Large Language Models Evaluation Benchmark

Results 13 mlmm-evaluation issues
Sort by recently updated
recently updated
newest added

This PR adds the evaluation results for [Qwen2-7B](https://huggingface.co/Qwen/Qwen2-7B) model on ArabicArc dataset.

The evaluation is too slow. It needs 68 hours for only one language.