ChatGLM-6B
ChatGLM-6B copied to clipboard
ptuning微调时,loss下降很慢怎么办?
Is there an existing issue for this?
- [X] I have searched the existing issues
Current Behavior
使用ptuning下的代码,和AdvertiseGen的训练数据,参数设置如下: PRE_SEQ_LEN=32 LR=1e-2
CUDA_VISIBLE_DEVICES=0 nohup python -u main.py
--do_train
--train_file AdvertiseGen/train.json
--validation_file AdvertiseGen/dev.json
--prompt_column content
--response_column summary
--overwrite_cache
--model_name_or_path /home/chatGPT/model/chatGLMModel/chatGLMHuggingFace/chatglm-6b
--output_dir output/adgen-chatglm-6b-pt-$PRE_SEQ_LEN-$LR
--overwrite_output_dir
--max_source_length 64
--max_target_length 64
--per_device_train_batch_size 2
--per_device_eval_batch_size 1
--gradient_accumulation_steps 16
--predict_with_generate
--max_steps 6000
--logging_steps 10
--save_steps 1000
--learning_rate $LR
--pre_seq_len $PRE_SEQ_LEN
--quantization_bit 4 >train.log 2>&1 &
训练完6000步之后,结果如下: { "epoch": 1.68, "train_loss": 4.087045756022135, "train_runtime": 389852.3508, "train_samples": 114599, "train_samples_per_second": 0.492, "train_steps_per_second": 0.015 } loss很大,使用web_demo时,回答的问题都不正常了。
Expected Behavior
精调完成之后,loss下降到一个合适的值,精调后的模型能够回答问题;
Steps To Reproduce
执行./train.sh,具体参数见上面描述
Environment
- OS:centos
- Python:3.8
- Transformers:4.28.0
- PyTorch:1.13.0
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :True
Anything else?
No response