tree-of-thought-prompting Test dataset of questions to score reasoning

Test dataset of questions to score reasoning

Open sapph1re opened this issue 2 years ago • 1 comments

This indeed greatly improves prompting, although one question may be not very representative for the whole approach. To measure suggested solutions properly, shall we create a test dataset of questions to evaluate the results that we get from each prompt?

Aug 09 '23 07:08 sapph1re

A test dataset would be a great idea.

There are many frameworks for testing LLMs available now, such as https://github.com/openai/human-eval

Aug 09 '23 17:08 dave1010

tree-of-thought-prompting tree-of-thought-prompting copied to clipboard

Test dataset of questions to score reasoning

tree-of-thought-prompting
tree-of-thought-prompting copied to clipboard