LongBench icon indicating copy to clipboard operation
LongBench copied to clipboard

报错TypeError: Couldn't cast array of type list<item: string> to null

Open xxcoco763 opened this issue 1 year ago • 1 comments

服务器没办法链接huggingface,只是将pred.py中THU/Longbench的路径换成了本地的/home/eval/LongBench/data,config文件中的模型路径也已经添加,报错如下 CUDA_VISIBLE_DEVICES=7 python pred.py --model llama2-13b-chat-16k Resolving data files: 100%|████████████████████████████████████| 34/34 [00:00<00:00, 149169.81it/s] Downloading data files: 100%|██████████████████████████████████████| 1/1 [00:00<00:00, 1417.95it/s] Extracting data files: 100%|█████████████████████████████████████████| 1/1 [00:00<00:00, 87.24it/s] Generating train split: 2500 examples [00:00, 4816.93 examples/s] Traceback (most recent call last): File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/builder.py", line 1940, in _prepare_split_single writer.write_table(table) File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/arrow_writer.py", line 572, in write_table pa_table = table_cast(pa_table, self._schema) File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/table.py", line 2328, in table_cast return cast_table_to_schema(table, schema) File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/table.py", line 2287, in cast_table_to_schema arrays = [cast_array_to_feature(table[name], feature) for name, feature in features.items()] File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/table.py", line 2287, in arrays = [cast_array_to_feature(table[name], feature) for name, feature in features.items()] File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/table.py", line 1831, in wrapper return pa.chunked_array([func(chunk, *args, **kwargs) for chunk in array.chunks]) File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/table.py", line 1831, in return pa.chunked_array([func(chunk, *args, **kwargs) for chunk in array.chunks]) File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/table.py", line 2143, in cast_array_to_feature return array_cast(array, feature(), allow_number_to_str=allow_number_to_str) File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/table.py", line 1833, in wrapper return func(array, *args, **kwargs) File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/table.py", line 2028, in array_cast raise TypeError(f"Couldn't cast array of type\n{array.type}\nto\n{pa_type}") TypeError: Couldn't cast array of type list<item: string> to null

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/root/zyx/eval/LongBench/pred.py", line 163, in data = load_dataset('/root/zyx/eval/LongBench/data/data', dataset, split='test') File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/load.py", line 2153, in load_dataset builder_instance.download_and_prepare( File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/builder.py", line 954, in download_and_prepare self._download_and_prepare( File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/builder.py", line 1049, in _download_and_prepare self._prepare_split(split_generator, **prepare_split_kwargs) File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/builder.py", line 1813, in _prepare_split for job_id, done, content in self._prepare_split_single( File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/builder.py", line 1958, in _prepare_split_single raise DatasetGenerationError("An error occurred while generating the dataset") from e datasets.builder.DatasetGenerationError: An error occurred while generating the dataset

请问如何解决呢

xxcoco763 avatar Feb 26 '24 08:02 xxcoco763

如果已经将LongBench的data/下载到了本地,可以用如下方式读入文件以载入数据集:将pred.py第166行改为:

data = [json.loads(line) for line in open(f"LongBench/data/{dataset}.jsonl", encoding="utf-8")]

bys0318 avatar Feb 27 '24 07:02 bys0318