ChatGLM-6B [BUG/Help]使用默认参数，ptuning损失接近0

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

损失接近0，完整日志如下： main.log

Expected Behavior

No response

Steps To Reproduce

1.下载解压训练数据AdvertiseGen 2.修改train.sh，注释最后一行 #--quantization_bit 4 3.执行sh train.sh

Environment

- OS:centos
- Python:3.9.16
- Transformers:4.27.1
- PyTorch:2.0.0
- CUDA Support:true，T4

Anything else?

No response

Apr 21 '23 23:04 candowu

我遇到了同样的问题，我设置prefix_projection=True，并注释了quantization_bit 4，采用半精度训练。显卡：v100，32g ;torch 2.0.0; transformers 4.27.1; python 3.8.10 train.sh 如下： PRE_SEQ_LEN=128 LR=2e-2

CUDA_VISIBLE_DEVICES=0 python3 main.py
--do_train
--train_file AdvertiseGen/train.json
--validation_file AdvertiseGen/dev.json
--prompt_column content
--response_column summary
--prefix_projection
--overwrite_cache
--model_name_or_path THUDM/chatglm-6b
--output_dir output/adgen-chatglm-6b-pt-$PRE_SEQ_LEN-$LR
--overwrite_output_dir
--max_source_length 512
--max_target_length 64
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 16
--predict_with_generate
--num_train_epochs 5
--logging_steps 10
--save_steps 1000
--learning_rate $LR
--pre_seq_len $PRE_SEQ_LEN \

训练loss持续为0：

Apr 22 '23 09:04 smallsmallwood

我这边的原因可能是代码是最新的，模型不是最新的。更新模型为最新之后，问题没有再出现。

Apr 22 '23 12:04 candowu

我调整了学习率，loss正常下降了

Apr 23 '23 02:04 smallsmallwood

请问注释量化后，学习率调整到多少合适？

May 06 '23 04:05 w1ida