InfiniteBench
InfiniteBench copied to clipboard
Evaluation index calculation
Dear author, when I was evaluating llama3.1, for the longbook_qa_eng task, the evaluation results made me very confused. The results were completely consistent, but the f1 value was missing and was 0.
it seems that <|eot_id|> is a special token, and you can pre-process it