ChatGLM-6B
ChatGLM-6B copied to clipboard
[BUG/Help] P-Tuning v2支持DeepSpeed进行数据并行吗
Is there an existing issue for this?
- [X] I have searched the existing issues
Current Behavior
请问下P-Tuning v2支持DeepSpeed进行数据并行吗,发现基于P-Tuning v2使用deepspeed运行结果与单独使用P-Tuning v2或单独使用deepspeed数据并行的loss相差很大
运行结果:
Full fine-tuning(dp:4)
train metrics
epoch = 9.99
train_loss = 13.3965
train_runtime = 8:35:53.83
train_samples = 114599
train_samples_per_second = 37.023
train_steps_per_second = 0.154
P-Tuning v2(dp:3)
train metrics
epoch = 9.63
train_loss = 89.8907
train_runtime = 3:38:44.99
train_samples = 114599
train_samples_per_second = 87.314
train_steps_per_second = 0.014
P-Tuning v2
train metrics
epoch = 10.0
train_loss = 12.7032
train_runtime = 11:12:06.64
train_samples = 114599
train_samples_per_second = 28.418
train_steps_per_second = 0.014
Expected Behavior
No response
Steps To Reproduce
执行命令:
Full fine-tuning(dp:4)
LR=1e-4
MASTER_PORT=$(shuf -n 1 -i 10000-65535)
deepspeed --include localhost:4,5,6,7 --master_port $MASTER_PORT main.py \
--deepspeed deepspeed.json \
--do_train \
--train_file /data/nfs/llm/data/AdvertiseGen/train.json \
--test_file /data/nfs/llm/data/AdvertiseGen/dev.json \
--prompt_column content \
--response_column summary \
--overwrite_cache \
--model_name_or_path /data/nfs/llm/model/chatglm-6b \
--output_dir /home/guodong.li/output/adgen-chatglm-6b-ft-$LR \
--overwrite_output_dir \
--max_source_length 64 \
--max_target_length 64 \
--per_device_train_batch_size 30 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 2 \
--predict_with_generate \
--num_train_epochs 10 \
--logging_steps 10 \
--save_steps 200 \
--learning_rate $LR \
--fp16
P-Tuning v2(dp:3)
PRE_SEQ_LEN=128
LR=2e-2
deepspeed --include localhost:1,2,3 --master_port 29001 main.py \
--deepspeed deepspeed.json \
--do_train \
--train_file /data/nfs/llm/data/AdvertiseGen/train.json \
--validation_file /data/nfs/llm/data/AdvertiseGen/dev.json \
--prompt_column content \
--response_column summary \
--overwrite_cache \
--model_name_or_path /data/nfs/llm/model/chatglm-6b \
--output_dir /home/guodong.li/output/adgen-chatglm-6b-pt \
--overwrite_output_dir \
--max_source_length 64 \
--max_target_length 64 \
--per_device_train_batch_size 128 \
--per_device_eval_batch_size 8 \
--gradient_accumulation_steps 16 \
--predict_with_generate \
--num_train_epochs 10 \
--logging_steps 10 \
--save_steps 100 \
--learning_rate $LR \
--pre_seq_len $PRE_SEQ_LEN
P-Tuning v2
PRE_SEQ_LEN=128
LR=2e-2
CUDA_VISIBLE_DEVICES=0 python3 main.py \
--do_train \
--train_file /data/nfs/llm/data/AdvertiseGen/train.json \
--validation_file /data/nfs/llm/data/AdvertiseGen/dev.json \
--prompt_column content \
--response_column summary \
--overwrite_cache \
--model_name_or_path /data/nfs/llm/model/chatglm-6b \
--output_dir /home/guodong.li/output/adgen-chatglm-6b-pt-$PRE_SEQ_LEN-$LR \
--overwrite_output_dir \
--max_source_length 64 \
--max_target_length 64 \
--per_device_train_batch_size 128 \
--per_device_eval_batch_size 8 \
--gradient_accumulation_steps 16 \
--predict_with_generate \
--num_train_epochs 10 \
--logging_steps 10 \
--save_steps 100 \
--learning_rate $LR \
--pre_seq_len $PRE_SEQ_LEN
Environment
- OS:Centos 7
- Python: 3.10
- Transformers: 2.28.0
- PyTorch: 1.13.1
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :
Anything else?
No response
我也遇到了这个问题,P-Tuning v2 训练中,貌似 deepspeed 没有正确的自动加载 gradient_accumulation_steps。 在 deepspeed_config.json 中手动设置了 gradient_accumulation_steps 后,loss 恢复正常