LongBench icon indicating copy to clipboard operation
LongBench copied to clipboard

Why are empty responses ignored in LongBench v2?

Open junhuihe-hjh opened this issue 11 months ago • 1 comments

We noticed that in pred.py (lines 98-99), empty responses are ignored and not included in the final score. Is this approach reasonable? We are concerned that models might exploit this by simply not responding to questions they are unsure about.

junhuihe-hjh avatar Jan 15 '25 08:01 junhuihe-hjh

Hi! Empty responses only occur when an exception is raised during model calls, as seen here: https://github.com/THUDM/LongBench/blob/main/pred.py#L54. During evaluation, models always output some response, even when unsure, and never return an empty string.

bys0318 avatar Jan 23 '25 05:01 bys0318