Loading local datasets with split=‘test’

Open yichen0104 opened this issue 1 year ago • 1 comments

I’m trying to evaluate a new model with LongBench and would like to load the datasets stored locally (downloaded and unzipped directly from HuggingFace). But whenever I’m reading the data with flag split=‘test’ in pred.py (say we are reading xxx.jsonl within the loop, the line is modded as data = load_dataset("json", data_files="/some/dir/xxx.jsonl", split="test") ), it will return a ValurError: Unknown split “test”. Should be one of [‘train’]. Is there any pre-processing I should perform on the downloaded data? Thanks in advance.

May 30 '24 16:05 yichen0104

If you have downloaded the dataset files locally, you can load them via:

data = [json.loads(line) for line in open("/some/dir/xxx.jsonl", "r", encoding="utf-8")]

Jun 04 '24 10:06 bys0318