Integrate metrics team LLMaJ with current unitxt implemantation
Integrating the metrics-llmaj pipeline with the current unitxt llmaj. This required some changes outside the scope of our new catalog:
-
moving code from fm-eval to unitxt, including changes from @arielge's old pull request with the log probs inference/processors (https://github.com/IBM/unitxt/pull/1111), our tasks and templates and some more _infer_log_probs support.
-
Changing the LLMAsJudge class to allow different processing of the input. Specifically to let the template get the different dataset fields (answer, contexts etc) rather than the full template+response of the previous model. Ideally we'd have a LLMAsJudge parent class and deriving classes, but to make it backward compatible I kept the LLMAsJudge api as before and worked around it for now by adding a class
LLMAsJudgeTaskFormatteras an attribute. Hope we can agree on a better design together.