LongBench
LongBench copied to clipboard
[ACL 2024] LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
Hi! I'm working on a long document QA problem and looked into the MultiFieldQA-en dataset recently. I downloaded the dataset using the following code snippet: ``` from datasets import load_dataset...
As the title, my evaluation of `Llama2-7B-chat-4k` on `PassageRetrieval-zh` gets `10.12`, which is significantly higher than the README (0.5), could you please share why?
For Multi-DocumentQA is there a simple way to know which passages are being retrieved from for a given answer? It would be very helpful to have a field called 'retrieval_indices'...
在文章的图6中看到了一个相关系数的图,想问问这个系数是怎么计算的。https://arxiv.org/pdf/2308.14508.pdf
服务器没办法链接huggingface,只是将pred.py中THU/Longbench的路径换成了本地的/home/eval/LongBench/data,config文件中的模型路径也已经添加,报错如下 CUDA_VISIBLE_DEVICES=7 python pred.py --model llama2-13b-chat-16k Resolving data files: 100%|████████████████████████████████████| 34/34 [00:00
Hi, do you report the Mistral-7B results? Thank you!
多进程的原因吗?减少进程数可解决吗
Hi, I found that the original script cannot handle large models on long context effectively, since it use multiprocess to load an entire model on a single gpu. I also...
测试数据中的TREC数据集,label之一的“Lasting time of something”全部拼写成了“Lasting time of somethin”