WizardLM
WizardLM copied to clipboard
Finetune From WizardLM/WizardCoder-15B-V1.0, but No effect
Hi, hello, I made fine-tuning based on WizardLM/WizardCoder-15B-V1.0, I trained the machine to be 8*V100 32G, trained for 22 hours, and then tested with checkpoint 1600
But the effect is very unsatisfactory, as if the model does not do any reasoning at all, can I help see what the problem is?
Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
### Instruction:
只生成public方法的单元测试代码,要求代码覆盖率100%
### Input
public class TransactionGlobalServiceImpl implements TransactionGlobalService {
@Autowired
private TransactionGlobalMapper transactionGlobalMapper;
@Override
public TransactionGlobal queryTransactionGlobal(String bPartnerId) {
QueryWrapper wrapper = new QueryWrapper();
wrapper.in("bpartner_id",bPartnerId);
wrapper.in("is_deleted",0);
wrapper.orderByAsc("create_date");
List<TransactionGlobal> transactionGlobals = transactionGlobalMapper.selectList(wrapper);
if(!CollectionUtils.isAnyEmpty(transactionGlobals)){
return transactionGlobals.get(0);
}else{
return null;
}
}
}
### Response:<|endoftext|>
The training script I used is as follows:
deepspeed train_wizardcoder.py \
--model_name_or_path "/data/models/WizardLM/WizardCoder-15B-V1.0" \
--data_path "/data/datasets/java_wizard" \
--output_dir "/data/models/wizard_java_from_starchat" \
--num_train_epochs 3 \
--model_max_length 2048 \
--per_device_train_batch_size 4\
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 2 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 50 \
--save_total_limit 2 \
--learning_rate 2e-5 \
--warmup_steps 30 \
--logging_steps 2 \
--lr_scheduler_type "cosine" \
--gradient_checkpointing True \
--deepspeed configs/deepspeed_config.json \
--fp16 True 2>&1 | tee /data/logs/deep.log
The trained dataset is as follows:
Neither input nor output have line breaks, which affects the effect of finetune??!
This is a strange problem. I think you can do a small experiment to fine-tune the model with Code Alpaca and check whether the same problem exists. Maybe just fine-tune 1 epoch, 512 seqlen.
This is a strange problem. I think you can do a small experiment to fine-tune the model with Code Alpaca and check whether the same problem exists. Maybe just fine-tune 1 epoch, 512 seqlen.
How long would it take to fine-tune on an 8*A100 40G machine? I have 78,000 rows of data?