Yoav Katz

Results 74 comments of Yoav Katz

Working on it in https://github.com/IBM/unitxt/pull/1508/

This is an important point to address. It has to be in the docs and not only the code.

Should remain ooen.

Hi Dafna. Indeed the requirement to register classes is a cumbersome one. If I understand right, this proposed solution, searchs for a class in the unitxt folders. Unitxt allows extension...

I think the way things worked before, is that once you imported a class, it registered it. Do if I extended unitxt and did from mymodule import MyMetric it would...

Hi @piotrhelm - You need to delete the old 3_1 judges jsons from the repo. /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/llm_as_judge/binary/llama_3_1_70b_instruct_wml_answer_correctness_q_a_gt_loose.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/llm_as_judge/binary/llama_3_1_70b_instruct_wml_answer_relevance_q_a.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/llm_as_judge/binary/llama_3_1_70b_instruct_wml_correctness_holistic_q_c_a_logprobs.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/llm_as_judge/binary/llama_3_1_70b_instruct_wml_answer_correctness_q_a_gt_loose_logprobs.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/llm_as_judge/binary/llama_3_1_70b_instruct_wml_answer_relevance_q_a_logprobs.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/llm_as_judge/binary/llama_3_1_70b_instruct_wml_faithfulness_c_a_logprobs.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/llm_as_judge/binary/llama_3_1_70b_instruct_wml_faithfulness_q_c_a_logprobs.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/llm_as_judge/binary/llama_3_1_70b_instruct_wml_faithfulness_q_c_a.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/llm_as_judge/binary/llama_3_1_70b_instruct_wml_correctness_holistic_q_c_a.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/llm_as_judge/binary/llama_3_1_70b_instruct_wml_faithfulness_c_a.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/llm_as_judge/binary/llama_3_1_70b_instruct_wml_context_relevance_q_c_ares.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/llm_as_judge/binary/llama_3_1_70b_instruct_wml_context_relevance_q_c_ares_logprobs.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/rag/correctness_holistic/llama_3_1_70b_instruct_wml_q_c_a_logprobs.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/rag/correctness_holistic/llama_3_1_70b_instruct_wml_q_c_a.json /home/runner/work/unitxt/unitxt/src/unitxt/catalog/metrics/rag/correctness_holistic/llama_3_1_70b_instruct_wml_q_c_a_numeric.json...

Thanks @algadhib for providing such a clear way to recreate. I checked and this occurs in past versions as well (atleast 1.14.0) - at least with the current models and...

From what I see, the metric does not necessary return a score between 0 and 1, but instead been -1 and 1. This is because at the core it does...