LongBench issues

Chinese Examples in MultiFieldQA-en

1

Hi! I'm working on a long document QA problem and looked into the MultiFieldQA-en dataset recently. I downloaded the dataset using the following code snippet: ``` from datasets import load_dataset...

wendywangwwt

`Llama2-7B-chat-4k` on `PassageRetrieval-zh` gets `10.12`

5

As the title, my evaluation of `Llama2-7B-chat-4k` on `PassageRetrieval-zh` gets `10.12`, which is significantly higher than the README (0.5), could you please share why?

fuqichen1998

请问数据集中 avg length 是单词长度/字长度还是token个数？

1

deepindeed2022

Include data on which passage contains answer

1

For Multi-DocumentQA is there a simple way to know which passages are being retrieved from for a given answer? It would be very helpful to have a field called 'retrieval_indices'...

danielmisrael

enhancement

求问 Spearman correlation 是怎么计算的

1

在文章的图6中看到了一个相关系数的图，想问问这个系数是怎么计算的。https://arxiv.org/pdf/2308.14508.pdf

randomtutu

报错TypeError: Couldn't cast array of type list<item: string> to null

1

服务器没办法链接huggingface，只是将pred.py中THU/Longbench的路径换成了本地的/home/eval/LongBench/data,config文件中的模型路径也已经添加，报错如下 CUDA_VISIBLE_DEVICES=7 python pred.py --model llama2-13b-chat-16k Resolving data files: 100%|████████████████████████████████████| 34/34 [00:00

xxcoco763

Any Implementation of Mistral-7B?

1

Hi, do you report the Mistral-7B results? Thank you!

leeyeehoo

测试13b，比如百川，1*A100（80G）会OOM

多进程的原因吗？减少进程数可解决吗

lvjianxin

Evaluate on long context (32k,64k etc..) on 30B/70B large models

5

Hi, I found that the original script cannot handle large models on long context effectively, since it use multiprocess to load an entire model on a single gpu. I also...

CaesarWWK

关于TREC数据集中的typo

1

测试数据中的TREC数据集，label之一的“Lasting time of something”全部拼写成了“Lasting time of somethin”

QuanYuhan

LongBench
LongBench copied to clipboard

Metadata

Chinese Examples in MultiFieldQA-en

`Llama2-7B-chat-4k` on `PassageRetrieval-zh` gets `10.12`

请问数据集中 avg length 是单词长度/字长度还是token个数？

Include data on which passage contains answer

求问 Spearman correlation 是怎么计算的

报错TypeError: Couldn't cast array of type list<item: string> to null

Any Implementation of Mistral-7B?

测试13b，比如百川，1*A100（80G）会OOM

Evaluate on long context (32k,64k etc..) on 30B/70B large models

关于TREC数据集中的typo

← Metadata

Owner

Metadata

LongBench LongBench copied to clipboard

Metadata

← Metadata

Owner

Metadata

LongBench
LongBench copied to clipboard