langtest icon indicating copy to clipboard operation
langtest copied to clipboard

Explore MS promptbench

Open dcecchini opened this issue 7 months ago • 0 comments

Explore the new tool released by Microsoft for evaluation of LLMs.

Brief description:

It consists of a wide range of LLMs and evaluation datasets, covering diverse tasks, evaluation protocols, adversarial prompt attacks, and prompt engineering techniques. As a holistic library, it also supports several analysis tools for interpreting the results. It is designed in a modular fashion, allowing to build evaluation pipelines for custom projects.

So, I think we should check what are the techniques they use to evaluate the models, as well as datasets they support, tasks, and analysis tools to interpret the results.

Github link: promptbench

dcecchini avatar Dec 18 '23 11:12 dcecchini