ChatGLM-6B icon indicating copy to clipboard operation
ChatGLM-6B copied to clipboard

[BUG/Help]使用默认参数,ptuning损失接近0

Open candowu opened this issue 1 year ago • 3 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

损失接近0,完整日志如下: main.log

Expected Behavior

No response

Steps To Reproduce

1.下载解压训练数据AdvertiseGen 2.修改train.sh,注释最后一行 #--quantization_bit 4 3.执行sh train.sh

Environment

- OS:centos
- Python:3.9.16
- Transformers:4.27.1
- PyTorch:2.0.0
- CUDA Support:true,T4

Anything else?

No response

candowu avatar Apr 21 '23 23:04 candowu

我遇到了同样的问题,我设置prefix_projection=True,并注释了quantization_bit 4,采用半精度训练。 显卡:v100,32g ;torch 2.0.0; transformers 4.27.1; python 3.8.10 train.sh 如下: PRE_SEQ_LEN=128 LR=2e-2

CUDA_VISIBLE_DEVICES=0 python3 main.py
--do_train
--train_file AdvertiseGen/train.json
--validation_file AdvertiseGen/dev.json
--prompt_column content
--response_column summary
--prefix_projection
--overwrite_cache
--model_name_or_path THUDM/chatglm-6b
--output_dir output/adgen-chatglm-6b-pt-$PRE_SEQ_LEN-$LR
--overwrite_output_dir
--max_source_length 512
--max_target_length 64
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 16
--predict_with_generate
--num_train_epochs 5
--logging_steps 10
--save_steps 1000
--learning_rate $LR
--pre_seq_len $PRE_SEQ_LEN \

训练loss持续为0: image

smallsmallwood avatar Apr 22 '23 09:04 smallsmallwood

我这边的原因可能是代码是最新的,模型不是最新的。 更新模型为最新之后,问题没有再出现。

candowu avatar Apr 22 '23 12:04 candowu

我调整了学习率,loss正常下降了

smallsmallwood avatar Apr 23 '23 02:04 smallsmallwood

请问注释量化后,学习率调整到多少合适?

w1ida avatar May 06 '23 04:05 w1ida