LongBench icon indicating copy to clipboard operation
LongBench copied to clipboard

[ACL 2024] LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

Results 66 LongBench issues
Sort by recently updated
recently updated
newest added

Hi! I'm working on a long document QA problem and looked into the MultiFieldQA-en dataset recently. I downloaded the dataset using the following code snippet: ``` from datasets import load_dataset...

As the title, my evaluation of `Llama2-7B-chat-4k` on `PassageRetrieval-zh` gets `10.12`, which is significantly higher than the README (0.5), could you please share why?

For Multi-DocumentQA is there a simple way to know which passages are being retrieved from for a given answer? It would be very helpful to have a field called 'retrieval_indices'...

enhancement

在文章的图6中看到了一个相关系数的图,想问问这个系数是怎么计算的。https://arxiv.org/pdf/2308.14508.pdf

服务器没办法链接huggingface,只是将pred.py中THU/Longbench的路径换成了本地的/home/eval/LongBench/data,config文件中的模型路径也已经添加,报错如下 CUDA_VISIBLE_DEVICES=7 python pred.py --model llama2-13b-chat-16k Resolving data files: 100%|████████████████████████████████████| 34/34 [00:00

Hi, do you report the Mistral-7B results? Thank you!

多进程的原因吗?减少进程数可解决吗

Hi, I found that the original script cannot handle large models on long context effectively, since it use multiprocess to load an entire model on a single gpu. I also...

测试数据中的TREC数据集,label之一的“Lasting time of something”全部拼写成了“Lasting time of somethin”