ChatGLM-6B ptuning微调时，loss下降很慢怎么办？

ptuning微调时，loss下降很慢怎么办？

Open niexufei opened this issue 1 year ago • 0 comments

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

使用ptuning下的代码，和AdvertiseGen的训练数据，参数设置如下： PRE_SEQ_LEN=32 LR=1e-2

CUDA_VISIBLE_DEVICES=0 nohup python -u main.py
--do_train
--train_file AdvertiseGen/train.json
--validation_file AdvertiseGen/dev.json
--prompt_column content
--response_column summary
--overwrite_cache
--model_name_or_path /home/chatGPT/model/chatGLMModel/chatGLMHuggingFace/chatglm-6b
--output_dir output/adgen-chatglm-6b-pt-$PRE_SEQ_LEN-$LR
--overwrite_output_dir
--max_source_length 64
--max_target_length 64
--per_device_train_batch_size 2
--per_device_eval_batch_size 1
--gradient_accumulation_steps 16
--predict_with_generate
--max_steps 6000
--logging_steps 10
--save_steps 1000
--learning_rate $LR
--pre_seq_len $PRE_SEQ_LEN
--quantization_bit 4 >train.log 2>&1 &

训练完6000步之后，结果如下： { "epoch": 1.68, "train_loss": 4.087045756022135, "train_runtime": 389852.3508, "train_samples": 114599, "train_samples_per_second": 0.492, "train_steps_per_second": 0.015 } loss很大，使用web_demo时，回答的问题都不正常了。

Expected Behavior

精调完成之后，loss下降到一个合适的值，精调后的模型能够回答问题；

Steps To Reproduce

执行./train.sh，具体参数见上面描述

Environment

- OS:centos
- Python:3.8
- Transformers:4.28.0
- PyTorch:1.13.0
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :True

Anything else?

No response

May 29 '23 03:05 niexufei

ChatGLM-6B ChatGLM-6B copied to clipboard

ptuning微调时，loss下降很慢怎么办？

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

ChatGLM-6B
ChatGLM-6B copied to clipboard