llm-leaderboard
llm-leaderboard copied to clipboard
A joint community effort to create one central leaderboard for LLMs.
 I cannot find any relevant evaluation on the [linked paper](https://arxiv.org/abs/2203.15556v1)... 
 As is seen from https://github.com/mosaicml/llm-foundry/tree/main/scripts/eval 
Except for pass@1 and elo rating, do other benchmarks only use `accuracy` for evaluation? Yeah, I think there are big issues since most evaluation results are using their respective metrics......
Please make the table reader friendly by fixing the header row and fixing the first column (with the benchmark name) so that these stay in place when the table is...
It would be nice if the list was auto-sorted to a weighted average across all rankings on page first load. I would suggest using Trueskill - which can rank players...
A column for censored yes/no A column for model size in GB (0.4 for 400MB) A column if the AI mentions it's an AI. (Some models, though i can only...
A source that might be of interest to this project: https://github.com/FreedomIntelligence/LLMZoo
HF [repo](https://huggingface.co/zhiqings/dromedary-65b-lora-delta-v0)
Hi, great work again. Are there any possibility to include any of these model's score on the benchmark if its available? Anthropic's Claude models: https://www.anthropic.com/product Cohere's LLM: https://docs.cohere.com/docs/introduction-to-large-language-models a21's Jurassic...
Check https://llm-leaderboard.streamlit.app