unitxt
unitxt copied to clipboard
Llm as judge ensemble
- The judge templates are defined at
prepare/templates/response_assessment/judges - The judge artifacts as JSON files are available at
prepare/templates/response_assessment/judges/config_judges.jsonandprepare/templates/response_assessment/judges/ensemble_relevance_v1.json. If it is possible to register these JSON objects to catalog, please let me know how can I do that. src/unitxt/processors.pydefines a new string output processorsrc/unitxt/metrics.pydefined 2 new ensemble-based metrics that inherits fromMetricsEnsembleand deserializes logistic/random-forest models (from json strings)
@yoavkatz @eladven
You need to add
evaluate_ensemble_judge.py
excluded_files = [ "use_llm_as_judge_metric.py", "standalone_evaluation_llm_as_judge.py", "evaluate_summarization_dataset_llm_as_judge.py", "evaluate_different_formats.py", "evaluate_different_templates.py", "evaluate_different_demo_selections.py", "evaluate_a_judge_model_capabilities_on_arena_hard.py", "evaluate_a_model_using_arena_hard.py", "evaluate_llm_as_judge.py", "evaluate_using_metrics_ensemble.py", "evaluate_existing_dataset_by_llm_as_judge.py", ]
in unitxt/tests/library/test_examples.py.
Without it, the regression tries to run your example (which requires IBMGenAI Api key)