unitxt icon indicating copy to clipboard operation
unitxt copied to clipboard

Llm as judge ensemble

Open pvn25 opened this issue 1 year ago • 1 comments

  • The judge templates are defined at prepare/templates/response_assessment/judges
  • The judge artifacts as JSON files are available at prepare/templates/response_assessment/judges/config_judges.json and prepare/templates/response_assessment/judges/ensemble_relevance_v1.json. If it is possible to register these JSON objects to catalog, please let me know how can I do that.
  • src/unitxt/processors.py defines a new string output processor
  • src/unitxt/metrics.py defined 2 new ensemble-based metrics that inherits from MetricsEnsemble and deserializes logistic/random-forest models (from json strings)

@yoavkatz @eladven

pvn25 avatar Jul 30 '24 03:07 pvn25

You need to add

evaluate_ensemble_judge.py

excluded_files = [ "use_llm_as_judge_metric.py", "standalone_evaluation_llm_as_judge.py", "evaluate_summarization_dataset_llm_as_judge.py", "evaluate_different_formats.py", "evaluate_different_templates.py", "evaluate_different_demo_selections.py", "evaluate_a_judge_model_capabilities_on_arena_hard.py", "evaluate_a_model_using_arena_hard.py", "evaluate_llm_as_judge.py", "evaluate_using_metrics_ensemble.py", "evaluate_existing_dataset_by_llm_as_judge.py", ]

in unitxt/tests/library/test_examples.py.

Without it, the regression tries to run your example (which requires IBMGenAI Api key)

yoavkatz avatar Aug 15 '24 21:08 yoavkatz