ChatGLM-6B icon indicating copy to clipboard operation
ChatGLM-6B copied to clipboard

[BUG/Help] P-Tuning v2支持DeepSpeed进行数据并行吗

Open liguodongiot opened this issue 1 year ago • 1 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

请问下P-Tuning v2支持DeepSpeed进行数据并行吗,发现基于P-Tuning v2使用deepspeed运行结果与单独使用P-Tuning v2或单独使用deepspeed数据并行的loss相差很大

运行结果:

Full fine-tuning(dp:4)

 train metrics 
  epoch                    =       9.99
  train_loss               =    13.3965
  train_runtime            = 8:35:53.83
  train_samples            =     114599
  train_samples_per_second =     37.023
  train_steps_per_second   =      0.154

P-Tuning v2(dp:3)

 train metrics 
  epoch                    =       9.63
  train_loss               =    89.8907
  train_runtime            = 3:38:44.99
  train_samples            =     114599
  train_samples_per_second =     87.314
  train_steps_per_second   =      0.014

P-Tuning v2

 train metrics 
  epoch                    =        10.0
  train_loss               =     12.7032
  train_runtime            = 11:12:06.64
  train_samples            =      114599
  train_samples_per_second =      28.418
  train_steps_per_second   =       0.014

Expected Behavior

No response

Steps To Reproduce

执行命令:

Full fine-tuning(dp:4)

LR=1e-4

MASTER_PORT=$(shuf -n 1 -i 10000-65535)

deepspeed --include localhost:4,5,6,7 --master_port $MASTER_PORT main.py \
    --deepspeed deepspeed.json \
    --do_train \
    --train_file /data/nfs/llm/data/AdvertiseGen/train.json \
    --test_file /data/nfs/llm/data/AdvertiseGen/dev.json \
    --prompt_column content \
    --response_column summary \
    --overwrite_cache \
    --model_name_or_path /data/nfs/llm/model/chatglm-6b \
    --output_dir /home/guodong.li/output/adgen-chatglm-6b-ft-$LR \
    --overwrite_output_dir \
    --max_source_length 64 \
    --max_target_length 64 \
    --per_device_train_batch_size 30 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 2 \
    --predict_with_generate \
    --num_train_epochs 10 \
    --logging_steps 10 \
    --save_steps 200 \
    --learning_rate $LR \
    --fp16

P-Tuning v2(dp:3)

PRE_SEQ_LEN=128
LR=2e-2

deepspeed --include localhost:1,2,3 --master_port 29001 main.py \
    --deepspeed deepspeed.json \
    --do_train \
    --train_file /data/nfs/llm/data/AdvertiseGen/train.json \
    --validation_file /data/nfs/llm/data/AdvertiseGen/dev.json \
    --prompt_column content \
    --response_column summary \
    --overwrite_cache \
    --model_name_or_path /data/nfs/llm/model/chatglm-6b \
    --output_dir /home/guodong.li/output/adgen-chatglm-6b-pt \
    --overwrite_output_dir \
    --max_source_length 64 \
    --max_target_length 64 \
    --per_device_train_batch_size 128 \
    --per_device_eval_batch_size 8 \
    --gradient_accumulation_steps 16 \
    --predict_with_generate \
    --num_train_epochs 10 \
    --logging_steps 10 \
    --save_steps 100 \
    --learning_rate $LR \
    --pre_seq_len $PRE_SEQ_LEN

P-Tuning v2

PRE_SEQ_LEN=128
LR=2e-2

CUDA_VISIBLE_DEVICES=0 python3 main.py \
    --do_train \
    --train_file /data/nfs/llm/data/AdvertiseGen/train.json \
    --validation_file /data/nfs/llm/data/AdvertiseGen/dev.json \
    --prompt_column content \
    --response_column summary \
    --overwrite_cache \
    --model_name_or_path /data/nfs/llm/model/chatglm-6b \
    --output_dir /home/guodong.li/output/adgen-chatglm-6b-pt-$PRE_SEQ_LEN-$LR \
    --overwrite_output_dir \
    --max_source_length 64 \
    --max_target_length 64 \
    --per_device_train_batch_size 128 \
    --per_device_eval_batch_size 8 \
    --gradient_accumulation_steps 16 \
    --predict_with_generate \
    --num_train_epochs 10 \
    --logging_steps 10 \
    --save_steps 100 \
    --learning_rate $LR \
    --pre_seq_len $PRE_SEQ_LEN

Environment

- OS:Centos 7 
- Python: 3.10
- Transformers: 2.28.0
- PyTorch: 1.13.1
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

liguodongiot avatar Apr 17 '23 01:04 liguodongiot

我也遇到了这个问题,P-Tuning v2 训练中,貌似 deepspeed 没有正确的自动加载 gradient_accumulation_steps。 在 deepspeed_config.json 中手动设置了 gradient_accumulation_steps 后,loss 恢复正常

Arrivederci avatar Apr 19 '23 05:04 Arrivederci