tree-of-thought-prompting icon indicating copy to clipboard operation
tree-of-thought-prompting copied to clipboard

Test dataset of questions to score reasoning

Open sapph1re opened this issue 2 years ago • 1 comments

This indeed greatly improves prompting, although one question may be not very representative for the whole approach. To measure suggested solutions properly, shall we create a test dataset of questions to evaluate the results that we get from each prompt?

sapph1re avatar Aug 09 '23 07:08 sapph1re

A test dataset would be a great idea.

There are many frameworks for testing LLMs available now, such as https://github.com/openai/human-eval

dave1010 avatar Aug 09 '23 17:08 dave1010