Why are empty responses ignored in LongBench v2?

Open junhuihe-hjh opened this issue 11 months ago • 1 comments

We noticed that in pred.py (lines 98-99), empty responses are ignored and not included in the final score. Is this approach reasonable? We are concerned that models might exploit this by simply not responding to questions they are unsure about.

Jan 15 '25 08:01 junhuihe-hjh

Hi! Empty responses only occur when an exception is raised during model calls, as seen here: https://github.com/THUDM/LongBench/blob/main/pred.py#L54. During evaluation, models always output some response, even when unsure, and never return an empty string.

Jan 23 '25 05:01 bys0318