swift icon indicating copy to clipboard operation
swift copied to clipboard

自定义训练集(custom_train_dataset_path)与验证集(custom_val_dataset_path)错误得按照dataset_test_ratio进行分割

Open tiandiweizun opened this issue 1 month ago • 0 comments

Describe the bug SftArguments 数据参数如下设置,其他无关均省略 dataset=[f'{DatasetName.alpaca_zh}#100', f'{DatasetName.alpaca_en}#50', f'{DatasetName.self_cognition}#250'] custom_train_dataset_path=["./data/faq_train.jsonl"], custom_val_dataset_path=["./data/faq_valid.jsonl"],

dataset_test_ratio 默认为0.01,这导致了 custom_train_dataset_path和custom_val_dataset_path传入路径时,均按照0.01进行了切分,且把custom_train_dataset_path和custom_val_dataset_path里面99%的内容作为训练,1%的内容作为验证集,不合符逻辑设定。

swift版本:commitId:845ac0ce46c4b904a809e3570e8dfb830f9b4e00

tiandiweizun avatar May 10 '24 07:05 tiandiweizun