[WeClone] I | 20:05:01 | Loading configuration from: ./settings.jsonc
[WeClone] I | 20:05:01 | 聊天记录禁用词: ['例如 密码', '例如 姓名', '//.....']
[WeClone] I | 20:05:01 | 共发现 1 个 CSV 文件,开始处理
[WeClone] D | 20:05:01 | 开始处理 CSV 文件: ./dataset/csv\55954793313@chatroom\55954793313@chatroom_0_64.csv
[WeClone] D | 20:05:01 | 处理完成: ./dataset/csv\55954793313@chatroom\55954793313@chatroom_0_64.csv,共加载 51 条消息
[WeClone] S | 20:05:01 | 聊天记录处理成功,共0条,保存到 ./dataset/res_csv/sft/sft-my.json
[WeClone] I | 20:05:09 | 开始计算cutoff_len......
Setting num_proc from 16 back to 1 for the train split to disable multiprocessing as it only contains one shard.
Generating train split: 0 examples [00:00, ? examples/s]
Traceback (most recent call last):
File "D:\weclone\Weclone\weclone\utils\length_cdf.py", line 73, in
fire.Fire(length_cdf)
File "D:\weclone\Weclone.venv\Lib\site-packages\fire\core.py", line 135, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\weclone\Weclone.venv\Lib\site-packages\fire\core.py", line 468, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "D:\weclone\Weclone.venv\Lib\site-packages\fire\core.py", line 684, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "D:\weclone\Weclone\weclone\utils\length_cdf.py", line 56, in length_cdf
trainset = get_dataset(template, model_args, data_args, training_args, "sft", **tokenizer_module)["train_dataset"] # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\weclone\Weclone.venv\Lib\site-packages\llamafactory\data\loader.py", line 307, in get_dataset
dataset = _get_merged_dataset(data_args.dataset, model_args, data_args, training_args, stage)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\weclone\Weclone.venv\Lib\site-packages\llamafactory\data\loader.py", line 179, in _get_merged_dataset
datasets[dataset_name] = _load_single_dataset(dataset_attr, model_args, data_args, training_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\weclone\Weclone.venv\Lib\site-packages\llamafactory\data\loader.py", line 128, in _load_single_dataset
dataset = load_dataset(
^^^^^^^^^^^^^
File "D:\weclone\Weclone.venv\Lib\site-packages\datasets\load.py", line 2163, in load_dataset
ds = builder_instance.as_dataset(split=split, verification_mode=verification_mode, in_memory=keep_in_memory)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\weclone\Weclone.venv\Lib\site-packages\datasets\builder.py", line 1126, in as_dataset
datasets = map_nested(
^^^^^^^^^^^
File "D:\weclone\Weclone.venv\Lib\site-packages\datasets\utils\py_utils.py", line 484, in map_nested
mapped = function(data_struct)
^^^^^^^^^^^^^^^^^^^^^
File "D:\weclone\Weclone.venv\Lib\site-packages\datasets\builder.py", line 1156, in _build_single_dataset
ds = self._as_dataset(
^^^^^^^^^^^^^^^^^
File "D:\weclone\Weclone.venv\Lib\site-packages\datasets\builder.py", line 1230, in _as_dataset
dataset_kwargs = ArrowReader(cache_dir, self.info).read(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\weclone\Weclone.venv\Lib\site-packages\datasets\arrow_reader.py", line 251, in read
raise ValueError(msg)
ValueError: Instruction "train" corresponds to no data!
[WeClone] E | 20:05:11 | 命令 'D:\weclone\Weclone.venv\Scripts\python.exe weclone\utils\length_cdf.py --model_name_or_path="./modelQwen" --dataset="wechat-sft" --dataset_dir="./dataset/res_csv/sft" --template="qwen" --interval=256' 执行失败,返回码 1
[WeClone] S | 20:05:11 | 聊天记录处理成功,共0条,保存到 ./dataset/res_csv/sft/sft-my.json