supervised finetune in chinese
I want to finetune bloom_1.1b in chinese dataset, and run run_chinese.sh. But in run_chinese.sh, where is the ds_config.json file.
It's under training/utils folder, called ds_utils.py
The scripts in training_scripts/other_language/ are very old scripts and haven't been updated. We will work on fixing it, but you can also simply apply these args into any English data example scripts to run Chinese case:
--data_path wangrui6/Zhihu-KOL Cohere/miracl-zh-queries-22-12 Hello-SimpleAI/HC3-Chinese mkqa-Chinese --data_split 10,0,0 --model_name_or_path bigscience/bloom-1b1 \
On the other hand, please understand that the Chinese data we found only support step 1 SFT training, thus we didn't verify any step 2/3 training for Chinese. You would need to explore that and related dataset by yourself.