unitxt Understanding how the metrics are calculated

Hi, I am looking athe Multiple-choice QA example provided here . I wanted to know how the accuracy is calculated. Do we 1) ask the model to generate the answer, perform some postprocessing to extract the answer, and compare with the gold truth, or 2) We create four sequences, each sequence ending with one of the four options. We calculate the log-likelihood of each of the four sequences and choose the sequence with the best likelihood score. The option corresponding to the best sequence is returned as the answer.

Thanks

Apr 22 '25 09:04 murthyrudra

It depends on the inference engine you use. If you use the Inference engine in the example than the first option you mentioned is exactly what is going on. If you use HFOptionSelectingInferenceEngine as the inference engine it will use any huggingface model to choose the right answer from the given options by using log probs. In both cases the final output of the inference engine will still be a textual prediction which will be than post processed. In the case of HFOptionSelectingInferenceEngine the post processing wont do much.

Apr 23 '25 11:04 elronbandel

This issue is stale because it has been open for 30 days with no activity.

May 24 '25 02:05 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale.

Jun 07 '25 02:06 github-actions[bot]