llm-leaderboard icon indicating copy to clipboard operation
llm-leaderboard copied to clipboard

A joint community effort to create one central leaderboard for LLMs.

Results 10 llm-leaderboard issues
Sort by recently updated
recently updated
newest added

![image](https://github.com/LudwigStumpp/llm-leaderboard/assets/8592144/fe4b9880-264a-467b-8ce1-78164d6fd773) I cannot find any relevant evaluation on the [linked paper](https://arxiv.org/abs/2203.15556v1)... ![image](https://github.com/LudwigStumpp/llm-leaderboard/assets/8592144/6a05ad1b-eeee-4758-abd6-d9564cf92aa7)

![image](https://github.com/LudwigStumpp/llm-leaderboard/assets/8592144/e656679e-36c4-4b63-9ca7-2a43d236755c) As is seen from https://github.com/mosaicml/llm-foundry/tree/main/scripts/eval ![image](https://github.com/LudwigStumpp/llm-leaderboard/assets/8592144/1d65e4e8-9ff0-4c26-9c93-4479dce8ceb3)

Except for pass@1 and elo rating, do other benchmarks only use `accuracy` for evaluation? Yeah, I think there are big issues since most evaluation results are using their respective metrics......

Please make the table reader friendly by fixing the header row and fixing the first column (with the benchmark name) so that these stay in place when the table is...

It would be nice if the list was auto-sorted to a weighted average across all rankings on page first load. I would suggest using Trueskill - which can rank players...

A column for censored yes/no A column for model size in GB (0.4 for 400MB) A column if the AI mentions it's an AI. (Some models, though i can only...

A source that might be of interest to this project: https://github.com/FreedomIntelligence/LLMZoo

HF [repo](https://huggingface.co/zhiqings/dromedary-65b-lora-delta-v0)

Hi, great work again. Are there any possibility to include any of these model's score on the benchmark if its available? Anthropic's Claude models: https://www.anthropic.com/product Cohere's LLM: https://docs.cohere.com/docs/introduction-to-large-language-models a21's Jurassic...

Check https://llm-leaderboard.streamlit.app