evals icon indicating copy to clipboard operation
evals copied to clipboard

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Results 428 evals issues
Sort by recently updated
recently updated
newest added

I am trying to execute the Building an MMLU Eval jupyter notebooks all of the cells execute correctly until I execute the following code: !oaieval gpt-3.5-turbo match_mmlu_anatomy I receive the...

Set of pull requests seems to be growing pretty fast, so not confident I should add another for such a small thing. Question is, "The day before yesterday, Chris was...

# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...

# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...

## Eval details 📑 ### Eval name binary_count ### Eval description This makes the model count 1s in a 10-100 long binary string. ### What makes this a useful eval?...

# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...

evaluate gpt to generate svg code for shapes from text inputs # Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to...

# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...

# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...