服务器没办法链接huggingface,只是将pred.py中THU/Longbench的路径换成了本地的/home/eval/LongBench/data,config文件中的模型路径也已经添加,报错如下
CUDA_VISIBLE_DEVICES=7 python pred.py --model llama2-13b-chat-16k
Resolving data files: 100%|████████████████████████████████████| 34/34 [00:00<00:00, 149169.81it/s]
Downloading data files: 100%|██████████████████████████████████████| 1/1 [00:00<00:00, 1417.95it/s]
Extracting data files: 100%|█████████████████████████████████████████| 1/1 [00:00<00:00, 87.24it/s]
Generating train split: 2500 examples [00:00, 4816.93 examples/s]
Traceback (most recent call last):
File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/builder.py", line 1940, in _prepare_split_single
writer.write_table(table)
File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/arrow_writer.py", line 572, in write_table
pa_table = table_cast(pa_table, self._schema)
File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/table.py", line 2328, in table_cast
return cast_table_to_schema(table, schema)
File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/table.py", line 2287, in cast_table_to_schema
arrays = [cast_array_to_feature(table[name], feature) for name, feature in features.items()]
File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/table.py", line 2287, in
arrays = [cast_array_to_feature(table[name], feature) for name, feature in features.items()]
File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/table.py", line 1831, in wrapper
return pa.chunked_array([func(chunk, *args, **kwargs) for chunk in array.chunks])
File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/table.py", line 1831, in
return pa.chunked_array([func(chunk, *args, **kwargs) for chunk in array.chunks])
File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/table.py", line 2143, in cast_array_to_feature
return array_cast(array, feature(), allow_number_to_str=allow_number_to_str)
File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/table.py", line 1833, in wrapper
return func(array, *args, **kwargs)
File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/table.py", line 2028, in array_cast
raise TypeError(f"Couldn't cast array of type\n{array.type}\nto\n{pa_type}")
TypeError: Couldn't cast array of type
list<item: string>
to
null
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/zyx/eval/LongBench/pred.py", line 163, in
data = load_dataset('/root/zyx/eval/LongBench/data/data', dataset, split='test')
File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/load.py", line 2153, in load_dataset
builder_instance.download_and_prepare(
File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/builder.py", line 954, in download_and_prepare
self._download_and_prepare(
File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/builder.py", line 1049, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/builder.py", line 1813, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/builder.py", line 1958, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset
请问如何解决呢
如果已经将LongBench的data/下载到了本地,可以用如下方式读入文件以载入数据集:将pred.py中第166行改为:
data = [json.loads(line) for line in open(f"LongBench/data/{dataset}.jsonl", encoding="utf-8")]