ChatGLM-Tuning
ChatGLM-Tuning copied to clipboard
[数据预处理-tokenization时报错] datasets.builder.DatasetGenerationError
按照步骤生成了jsonl文件 然后运行一下代码
python tokenize_dataset_rows.py ^
--jsonl_path data/alpaca_data.jsonl ^
--save_path data/alpaca ^
--max_seq_length 200
报错
E:\ChatGLM\ChatGLM3\ChatGLM-LoRA>python tokenize_dataset_rows.py ^
More? --jsonl_path data/alpaca_data.jsonl ^
More? --save_path data/alpaca ^
More? --max_seq_length 200
0%| | 0/52002 [00:00<?, ?it/s]
Generating train split: 0 examples [00:02, ? examples/s] | 0/52002 [00:00<?, ?it/s]
Traceback (most recent call last):
File "e:\anaconda3\Lib\site-packages\datasets\builder.py", line 1676, in _prepare_split_single
for key, record in generator:
File "e:\anaconda3\Lib\site-packages\datasets\packaged_modules\generator\generator.py", line 30, in _generate_examples
for idx, ex in enumerate(self.config.generator(**gen_kwargs)):
File "E:\ChatGLM\ChatGLM3\ChatGLM-LoRA\tokenize_dataset_rows.py", line 31, in read_jsonl
feature = preprocess(tokenizer, config, example, max_seq_length)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\ChatGLM\ChatGLM3\ChatGLM-LoRA\tokenize_dataset_rows.py", line 10, in preprocess
prompt = example["text"]
~~~~~~~^^^^^^^^
KeyError: 'text'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "E:\ChatGLM\ChatGLM3\ChatGLM-LoRA\tokenize_dataset_rows.py", line 53, in <module>
main()
File "E:\ChatGLM\ChatGLM3\ChatGLM-LoRA\tokenize_dataset_rows.py", line 46, in main
dataset = datasets.Dataset.from_generator(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "e:\anaconda3\Lib\site-packages\datasets\arrow_dataset.py", line 1072, in from_generator
).read()
^^^^^^
File "e:\anaconda3\Lib\site-packages\datasets\io\generator.py", line 47, in read
self.builder.download_and_prepare(
File "e:\anaconda3\Lib\site-packages\datasets\builder.py", line 954, in download_and_prepare
self._download_and_prepare(
File "e:\anaconda3\Lib\site-packages\datasets\builder.py", line 1717, in _download_and_prepare
super()._download_and_prepare(
File "e:\anaconda3\Lib\site-packages\datasets\builder.py", line 1049, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "e:\anaconda3\Lib\site-packages\datasets\builder.py", line 1555, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "e:\anaconda3\Lib\site-packages\datasets\builder.py", line 1712, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset
查阅信息,没有找到有效方法
有没有大佬邦邦鸭——