evals icon indicating copy to clipboard operation
evals copied to clipboard

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Results 428 evals issues
Sort by recently updated
recently updated
newest added

# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...

# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...

# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...

Bug fix to the basic.includes eval: If a ref in sample["ideal"] is a single character, `evals.elsuite.utils.get_answer` can return an empty string if the ref is found in the last character...

# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...

## Eval details 📑 ### Eval name `repeat_consonants` ### Eval description Tests the model's ability to repeat consonants x number of times in a given text. ### What makes this...

# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...

# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...

# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...

## Eval details 📑 ### Eval name last-word-nth ### Eval description Test the model's ability to tell what the last word of a sentence is, but by asking it indirectly...