LongBench
LongBench copied to clipboard
Loading local datasets with split=‘test’
I’m trying to evaluate a new model with LongBench and would like to load the datasets stored locally (downloaded and unzipped directly from HuggingFace). But whenever I’m reading the data with flag split=‘test’ in pred.py (say we are reading xxx.jsonl within the loop, the line is modded as data = load_dataset("json", data_files="/some/dir/xxx.jsonl", split="test") ), it will return a ValurError: Unknown split “test”. Should be one of [‘train’]. Is there any pre-processing I should perform on the downloaded data? Thanks in advance.
If you have downloaded the dataset files locally, you can load them via:
data = [json.loads(line) for line in open("/some/dir/xxx.jsonl", "r", encoding="utf-8")]