DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

supervised finetune in chinese

Open 18335100284 opened this issue 2 years ago • 1 comments

I want to finetune bloom_1.1b in chinese dataset, and run run_chinese.sh. But in run_chinese.sh, where is the ds_config.json file.

18335100284 avatar Apr 18 '23 05:04 18335100284

It's under training/utils folder, called ds_utils.py

ruihan0495 avatar Apr 18 '23 08:04 ruihan0495

The scripts in training_scripts/other_language/ are very old scripts and haven't been updated. We will work on fixing it, but you can also simply apply these args into any English data example scripts to run Chinese case:

--data_path wangrui6/Zhihu-KOL Cohere/miracl-zh-queries-22-12 Hello-SimpleAI/HC3-Chinese mkqa-Chinese --data_split 10,0,0 --model_name_or_path bigscience/bloom-1b1 \

On the other hand, please understand that the Chinese data we found only support step 1 SFT training, thus we didn't verify any step 2/3 training for Chinese. You would need to explore that and related dataset by yourself.

conglongli avatar Apr 24 '23 04:04 conglongli