mlmm-evaluation
mlmm-evaluation copied to clipboard
Multilingual Large Language Models Evaluation Benchmark
Results
13
mlmm-evaluation issues
Sort by
recently updated
recently updated
newest added
This PR adds the evaluation results for [Qwen2-7B](https://huggingface.co/Qwen/Qwen2-7B) model on ArabicArc dataset.
The evaluation is too slow. It needs 68 hours for only one language.