state-of-open-source-ai icon indicating copy to clipboard operation
state-of-open-source-ai copied to clipboard

Different evaluation frameworks for LLMs

Open Anindyadeep opened this issue 2 years ago • 0 comments

Type

new chapter

Chapter/Page

eval-datasets

Description

The evaluation page is really good, however, it would be awesome if we could add some information on the following evaluation frameworks.

  1. HELM by Stanford.
  2. LM Evaluation Harness by Eluther AI.
  3. Code Evaluation Harness by BigCode.

The content should be mainly regarding how they are trying to do evaluation and how to get started with each.

Anindyadeep avatar Oct 30 '23 14:10 Anindyadeep