LongBench
LongBench copied to clipboard
Chinese Examples in MultiFieldQA-en
Hi! I'm working on a long document QA problem and looked into the MultiFieldQA-en dataset recently.
I downloaded the dataset using the following code snippet:
from datasets import load_dataset
dataset = load_dataset("THUDM/LongBench",'multifieldqa_en')
While examining the content, I noticed that out of 150 entries, 2 are in Chinese rather than English:
.
Can you please take a look? Thank you!
Hi! They are classified as English samples as they contain more English characters (a-zA-Z) than Chinese characters.