ptuning v2微调的时候突然loss异常增大，然后接着loss就为0了

Open ldtgodlike opened this issue 2 years ago • 0 comments

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

ptuning v2微调的时候突然loss异常增大，然后接着loss就为0了。是不是某种bug造成了梯度爆炸，把学习率从2e-2 一直调小到1e-3都不行

Expected Behavior

捕获1111111

Steps To Reproduce

python main.py ^ --do_train ^ --do_eval ^ --train_file D:/AI/ChatGLM-6B/ptuning/dataset/train.json ^ --validation_file D:/AI/ChatGLM-6B/ptuning/dataset/test.json ^ --prompt_column prompt ^ --response_column response ^ --model_name_or_path THUDM/chatglm-6b ^ --output_dir D:/AI/ChatGLM-6B/ptuning/output/adgen-chatglm-6b-pt-700-80w-5e-3_resume ^ --max_source_length 640 ^ --max_target_length 320 ^ --per_device_train_batch_size 1 ^ --per_device_eval_batch_size 1 ^ --gradient_accumulation_steps 16 ^ --predict_with_generate ^ --max_steps 30000 ^ --logging_steps 50 ^ --save_steps 1000 ^ --learning_rate 1e-3 ^ --pre_seq_len 700

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

Jun 02 '23 06:06 ldtgodlike