Llm as judge ensemble

Open pvn25 opened this issue 1 year ago • 1 comments

The judge templates are defined at prepare/templates/response_assessment/judges
The judge artifacts as JSON files are available at prepare/templates/response_assessment/judges/config_judges.json and prepare/templates/response_assessment/judges/ensemble_relevance_v1.json. If it is possible to register these JSON objects to catalog, please let me know how can I do that.
src/unitxt/processors.py defines a new string output processor
src/unitxt/metrics.py defined 2 new ensemble-based metrics that inherits from MetricsEnsemble and deserializes logistic/random-forest models (from json strings)

@yoavkatz @eladven

Jul 30 '24 03:07 pvn25

You need to add

evaluate_ensemble_judge.py

excluded_files = [ "use_llm_as_judge_metric.py", "standalone_evaluation_llm_as_judge.py", "evaluate_summarization_dataset_llm_as_judge.py", "evaluate_different_formats.py", "evaluate_different_templates.py", "evaluate_different_demo_selections.py", "evaluate_a_judge_model_capabilities_on_arena_hard.py", "evaluate_a_model_using_arena_hard.py", "evaluate_llm_as_judge.py", "evaluate_using_metrics_ensemble.py", "evaluate_existing_dataset_by_llm_as_judge.py", ]

in unitxt/tests/library/test_examples.py.

Without it, the regression tries to run your example (which requires IBMGenAI Api key)

Aug 15 '24 21:08 yoavkatz