state-of-open-source-ai
state-of-open-source-ai copied to clipboard
Different evaluation frameworks for LLMs
Type
new chapter
Chapter/Page
eval-datasets
Description
The evaluation page is really good, however, it would be awesome if we could add some information on the following evaluation frameworks.
- HELM by Stanford.
- LM Evaluation Harness by Eluther AI.
- Code Evaluation Harness by BigCode.
The content should be mainly regarding how they are trying to do evaluation and how to get started with each.