Integrate metrics team LLMaJ with current unitxt implemantation

Open lilacheden opened this issue 1 year ago • 0 comments

Integrating the metrics-llmaj pipeline with the current unitxt llmaj. This required some changes outside the scope of our new catalog:

moving code from fm-eval to unitxt, including changes from @arielge's old pull request with the log probs inference/processors (https://github.com/IBM/unitxt/pull/1111), our tasks and templates and some more _infer_log_probs support.
Changing the LLMAsJudge class to allow different processing of the input. Specifically to let the template get the different dataset fields (answer, contexts etc) rather than the full template+response of the previous model. Ideally we'd have a LLMAsJudge parent class and deriving classes, but to make it backward compatible I kept the LLMAsJudge api as before and worked around it for now by adding a class LLMAsJudgeTaskFormatter as an attribute. Hope we can agree on a better design together.

Sep 08 '24 15:09 lilacheden