InfiniteBench
InfiniteBench copied to clipboard
Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718
Could you please provide the code for generating samples of Math.Find, Math.Calc, Code.RUN and Code.debug? I want to generate some test sample with shorter length, since my model only support...
https://github.com/OpenBMB/InfiniteBench/blob/main/src/compute_scores.py#L238 1. only one reference label is used for comparison, better loop around each answer in label, e.g., label=['ECKER', 'COMMANDER BILL ECKER']; 2. prediction phrase is splitted into words for...
When I try to run the following code in colab: from datasets import load_dataset dataset = load_dataset("xinrongzhang2022/InfiniteBench") I get the following error: > DatasetGenerationCastError: An error occurred while generating the...
GPT-4o
How is GPT4 run if the API has a hard-cutoff of 128k? The EN.QA and EN.MC dataset itself looks to be more than 128k tokens by itself. Am I missing...
Are the GPT4 results evaluated on a different set of `longbook_qa_eng`? The 'ground_truth' fields in [results/gpt4/preds_longbook_qa_eng.jsonl](https://github.com/OpenBMB/InfiniteBench/blob/main/results/gpt4/preds_longbook_qa_eng.jsonl) don't seem match with ground_truth in [results/chatglm3/preds_longbook_qa_eng.jsonl](https://github.com/OpenBMB/InfiniteBench/blob/main/results/chatglm3/preds_longbook_qa_eng.jsonl)