evals
evals copied to clipboard
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
### Discussed in https://github.com/openai/evals/discussions/621 Originally posted by **55255ru** April 10, 2023 Hello. I suggest using what is written on my website (which is here http://www.55255.ru) to improve GPT-4 because I...
It could be interesting to explore if we could use [MusPy](https://salu133445.github.io/muspy/) to add some text/symbolic music evals. /cc @salu133445
Hi would be cool to valuate all openai models on Beyond the Imitation Game Benchmark (BIG-bench) which is a collaborative benchmark intended to probe large language models and extrapolate their...
Hello everyone, thank you for contributions so far, I've been working through them and these tasks are forming a challenging a comprehensive benchmark for modern LLMs and LLM programs. We...
Noticed a few actions used in the workflows here are outdated, proposing a Dependabot configuration to update - reference https://docs.github.com/en/actions/security-guides/using-githubs-security-features-to-secure-your-use-of-github-actions#keeping-the-actions-in-your-workflows-secure-and-up-to-date Current workflow executions have a deprecation notice ex. https://github.com/openai/evals/actions/runs/8903656117 >...
### Describe the feature or improvement you're requesting Please add support for GPT4-o for evaluation . ### Additional context _No response_
# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, **failure to follow the guidelines below will result in the PR being closed...