unitxt Update rag llm as judge metric to support llama-3-3-70b model on WML.

Adding support for llama-3-3-70b model from WML.

Oct 09 '25 11:10 piotrhelm

Hi @piotrhelm - You need to delete the old 3_1 judges jsons from the repo.

/home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/llm_as_judge/binary/llama_3_1_70b_instruct_wml_answer_correctness_q_a_gt_loose.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/llm_as_judge/binary/llama_3_1_70b_instruct_wml_answer_relevance_q_a.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/llm_as_judge/binary/llama_3_1_70b_instruct_wml_correctness_holistic_q_c_a_logprobs.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/llm_as_judge/binary/llama_3_1_70b_instruct_wml_answer_correctness_q_a_gt_loose_logprobs.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/llm_as_judge/binary/llama_3_1_70b_instruct_wml_answer_relevance_q_a_logprobs.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/llm_as_judge/binary/llama_3_1_70b_instruct_wml_faithfulness_c_a_logprobs.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/llm_as_judge/binary/llama_3_1_70b_instruct_wml_faithfulness_q_c_a_logprobs.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/llm_as_judge/binary/llama_3_1_70b_instruct_wml_faithfulness_q_c_a.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/llm_as_judge/binary/llama_3_1_70b_instruct_wml_correctness_holistic_q_c_a.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/llm_as_judge/binary/llama_3_1_70b_instruct_wml_faithfulness_c_a.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/llm_as_judge/binary/llama_3_1_70b_instruct_wml_context_relevance_q_c_ares.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/llm_as_judge/binary/llama_3_1_70b_instruct_wml_context_relevance_q_c_ares_logprobs.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/rag/correctness_holistic/llama_3_1_70b_instruct_wml_q_c_a_logprobs.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/rag/correctness_holistic/llama_3_1_70b_instruct_wml_q_c_a.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/rag/correctness_holistic/llama_3_1_70b_instruct_wml_q_c_a_numeric.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/rag/faithfulness/llama_3_1_70b_instruct_wml_c_a.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/rag/faithfulness/llama_3_1_70b_instruct_wml_q_c_a_logprobs.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/rag/faithfulness/llama_3_1_70b_instruct_wml_q_c_a.json main() File "/home/runner/work/unitxt/unitxt/utils/prepare_all_artifacts.py", line 198, in main raise RuntimeError( RuntimeError: Branch's catalog is different from the total production of branch's prepare files. See details in the logs. /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/rag/faithfulness/llama_3_1_70b_instruct_wml_c_a_logprobs.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/rag/faithfulness/llama_3_1_70b_instruct_wml_c_a_verbal.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/rag/faithfulness/llama_3_1_70b_instruct_wml_q_c_a_verbal.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/rag/answer_correctness/llama_3_1_70b_instruct_wml_q_a_gt_loose_logprobs.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/rag/answer_correctness/llama_3_1_70b_instruct_wml_q_a_gt_loose.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/rag/answer_correctness/llama_3_1_70b_instruct_wml_q_a_gt_loose_numeric.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/rag/context_relevance/llama_3_1_70b_instruct_wml_q_c_ares_logprobs.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/rag/context_relevance/llama_3_1_70b_instruct_wml_q_c_ares_numeric.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/rag/context_relevance/llama_3_1_70b_instruct_wml_q_c_ares.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/rag/answer_relevance/llama_3_1_70b_instruct_wml_q_a.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/rag/answer_relevance/llama_3_1_70b_instruct_wml_q_a_logprobs.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/rag/answer_relevance/llama_3_1_70b_instruct_wml_q_a_numeric.json

Oct 12 '25 09:10 yoavkatz

@yoavkatz Done.

Oct 15 '25 08:10 piotrhelm

Closing as this is merged -> https://github.com/IBM/unitxt/pull/1948

Dec 01 '25 09:12 piotrhelm