Confusion over the model outputs

Open tranlm opened this issue 1 year ago • 0 comments

Hi team,

Thanks a lot for this package! I've been running the bbh benchmarks. Logging the model responses, I'm seeing something weird: The data seems to indicate that the model is outputting lower scores for the correct answers. I'm wondering if there's something I'm missing (e.g. is there a negative sign put into the scores for some reason?). For example, for the phi-3.5 model, under doc_id 0 the label is False and the model response is:

[ [ "-3.5762786865234375e-07", "True" ], [ "-15.0", "False" ] ]

This implies that the logits are higher for True, but the acc_norm column seems to mark the model response as correct. Am I misreading the scores that come with the model responses? I've tried creating a discussion thread here, but figured I'd just ask the team directly.

Thanks, Linh

Sep 23 '24 13:09 tranlm