Yushi Bai comments

Results 102 comments of


                                            Yushi Bai

Chinese Examples in MultiFieldQA-en

Hi! They are classified as English samples as they contain more English characters (a-zA-Z) than Chinese characters.

`Llama2-7B-chat-4k` on `PassageRetrieval-zh` gets `10.12`

Hi! Are you using the prompt template as in [config/dataset2prompt.json](https://github.com/THUDM/LongBench/blob/main/config/dataset2prompt.json)?

`Llama2-7B-chat-4k` on `PassageRetrieval-zh` gets `10.12`

We refer to our code here for the llama2 prompt: https://github.com/THUDM/LongBench/blob/main/pred.py#L33

求问 Spearman correlation 是怎么计算的

这个相关系数是按照我们所测的8个模型在每个任务上的得分，计算两两任务间这8个分数的相关性得到的。

Evaluate on long context (32k,64k etc..) on 30B/70B large models

You are right. We will update soon to also support 30B/70B models with accelerate/deepspeed.

Evaluate on long context (32k,64k etc..) on 30B/70B large models

Hi, @CaesarWWK Thanks for your reply! @lvjianxin An easy way (without modifying much to the current codebase) might be to add `device_map="auto"` to the model loading lines in `load_model_and_tokenizer()`. It...

Any Implementation of Mistral-7B?

Hi, we haven't officially evaluated Mistral-7B on LongBench. But I have seen this [paper](https://arxiv.org/abs/2401.01325) carried out the evaluation.

报错TypeError: Couldn't cast array of type list<item: string> to null

如果已经将LongBench的`data/`下载到了本地，可以用如下方式读入文件以载入数据集：将`pred.py`中[第166行](https://github.com/THUDM/LongBench/blob/main/pred.py#L166)改为： ```python data = [json.loads(line) for line in open(f"LongBench/data/{dataset}.jsonl", encoding="utf-8")] ```

Include data on which passage contains answer

Thanks for your suggestion! We will consider adding this feature to our dataset.

关于TREC数据集中的typo

谢谢指正，我们将会更新该数据集