autolabel Support for evaluating multiple LLMs in parallel with same task & dataset configuration

Support for evaluating multiple LLMs in parallel with same task & dataset configuration

Open nihit opened this issue 2 years ago • 2 comments

This is often helpful as a preliminary first step to understand which LLM does the best on a user's specific task and data

Jun 27 '23 05:06 nihit

Would be really neat if we are able to create a report for a user's dataset, similar to the benchmark we put together here: https://www.refuel.ai/blog-posts/llm-labeling-technical-report

Jun 27 '23 05:06 rishabh-bhargava

Would this be another function (maybe test_llms(list_of_llms) that we could add to LabelingAgent or would this be a config change where instead of a single model config, we could have a list of model configs to benchmark?

Jul 20 '23 22:07 Tyrest

autolabel autolabel copied to clipboard

Support for evaluating multiple LLMs in parallel with same task & dataset configuration

autolabel
autolabel copied to clipboard