Xuandi FU

Results 1 issues of Xuandi FU

Are the GPT4 results evaluated on a different set of `longbook_qa_eng`? The 'ground_truth' fields in [results/gpt4/preds_longbook_qa_eng.jsonl](https://github.com/OpenBMB/InfiniteBench/blob/main/results/gpt4/preds_longbook_qa_eng.jsonl) don't seem match with ground_truth in [results/chatglm3/preds_longbook_qa_eng.jsonl](https://github.com/OpenBMB/InfiniteBench/blob/main/results/chatglm3/preds_longbook_qa_eng.jsonl)