mlmm-evaluation icon indicating copy to clipboard operation
mlmm-evaluation copied to clipboard

Multilingual Large Language Models Evaluation Benchmark

Results 13 mlmm-evaluation issues
Sort by recently updated
recently updated
newest added

Hello, I've been trying with different LLMs but I haven't been able to make it works. Could you bring some light? ```shell luispoveda93@LUIS-PC:~/mlmm-evaluation$ bash scripts/run.sh es microsoft/Phi-3-mini-4k-instruct Selected Tasks: ['arc_es',...

I set lang as English, but it fails to work. Is it possible to run with English MMLU?

Hello, I've been trying to run the framework using a model I installed with Ollama, but I haven't been able to do it, maybe it's related to the model path,...

This PR adds the evaluation results for [Jais13B](https://huggingface.co/core42/jais-13b) model on ArabicArc dataset.

Thanks for your open sourcing! i'm trying to evaluate `Llama-7b-hf` on `mmlu-fr`, a warning of `Token indices sequence length is longer than the specified maximum sequence length for this model...

Dear authors, Thanks for your nice work. I am wondering if you also translated the ARC-easy dataset as currently the bash download script only yields the ARC-Challenge dataset. I really...

Hi, Can I submit results one by one for languages or do I have to do it all together? Thanks

Hello! Is there a way to control how many examples are used to evaluate the models? Also, how are the evaluations currently set up? Are all benchmarks (ARC, MMLU, HellaSwag)...

Hello, If you could add Azerbaijani(Arabic Script) as a language in your https://cohereforai-review-mmlu-translations.hf.space/dataset/97ce1e12-3204-4461-b865-b2fe1e879b95/annotation-mode?page=3&status=pending we can give you large dataset. We have a big team for this language. Our latest paper....

The scripts look for `config.json` in the hf repo. But for models whch are finetuned / adapter models that file is adapter_config.json wherein I might also need to give the...