evals
evals copied to clipboard
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
I am trying to execute the Building an MMLU Eval jupyter notebooks all of the cells execute correctly until I execute the following code: !oaieval gpt-3.5-turbo match_mmlu_anatomy I receive the...
Set of pull requests seems to be growing pretty fast, so not confident I should add another for such a small thing. Question is, "The day before yesterday, Chris was...
# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...
# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...
MYIDM
## Eval details 📑 ### Eval name binary_count ### Eval description This makes the model count 1s in a 10-100 long binary string. ### What makes this a useful eval?...
# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...
evaluate gpt to generate svg code for shapes from text inputs # Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to...
# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...
# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...