evals icon indicating copy to clipboard operation
evals copied to clipboard

Idea for Evals: Complex, multi-turn instruction-following Evals

Open andrew-openai opened this issue 1 year ago • 2 comments

Hello everyone, thank you for contributions so far, I've been working through them and these tasks are forming a challenging a comprehensive benchmark for modern LLMs and LLM programs. We worked on Completion Functions last week, which we're glad to have merged (more info coming), and I'll be returning to merging Eval submissions this week.

I want to try an idea where people can open issues that describe ideas for evals, tagged with the Idea For Eval label. If anyone has some relevant data or thinks it would be interesting to tackle that idea, they can open an Eval contribution PR with the same tag. Anyone is free to open Idea for Eval issues, especially if you are building an application or have a field of study for which you'd think getting some Evals could help your development process.

I'll start with an Idea, we know that our models can struggle with complex multi-turn instructions, especially if the instructions are domain relevant. If you have any tasks like this, please open a PR with the eval and we'll merge it into the benchmark.

We're reducing the 100 sample limit on contributions to 15, in order to encourage more one-off samples or handwritten example contributions like this.

Thanks!

andrew-openai avatar Apr 11 '23 06:04 andrew-openai