Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

无

Expected Behavior

No response

Steps To Reproduce

无

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

May 25 '23 05:05 jby20180901

这个只能自己研究了。

May 25 '23 07:05 cywjava

新项目，没那么齐全

May 25 '23 13:05 bookug

+1 ，好像只看到了多卡部署，没有并行训练

May 25 '23 15:05 ztfmars

PRE_SEQ_LEN=128 LR=2e-2 MASTER_PORT=$(shuf -n 1 -i 10000-65535)

deepspeed --include localhost:2,3,4,5
--master_port $MASTER_PORT main.py
--deepspeed deepspeed.json
--do_train
--train_file AdvertiseGen/train.json
--test_file AdvertiseGen/dev.json
--prompt_column content
--response_column summary
--overwrite_cache
--model_name_or_path /home/cser/hugo/ChatGLM-6B/chatglm-6b
--output_dir ./output/adgen-chatglm-6b-ft-$LR
--overwrite_output_dir
--max_source_length 64
--max_target_length 64
--per_device_train_batch_size 32
--per_device_eval_batch_size 32
--gradient_accumulation_steps 1
--predict_with_generate
--num_train_epochs 10
--logging_steps 10
--save_steps 1000
--learning_rate $LR
--pre_seq_len $PRE_SEQ_LEN
--fp16

可以吧，我是这样跑的四卡，关键是加上pre_seq_len走的ptuning分支。详见main.py里：

if model_args.pre_seq_len is not None: # P-tuning v2 model = model.half() model.transformer.prefix_encoder.float() else: # Finetune model = model.float()

May 26 '23 06:05 HuuY

PRE_SEQ_LEN=128 LR=2e-2 MASTER_PORT=$(shuf -n 1 -i 10000-65535)

deepspeed --include localhost:2,3,4,5 --master_port $MASTER_PORT main.py --deepspeed deepspeed.json --do_train --train_file AdvertiseGen/train.json --test_file AdvertiseGen/dev.json --prompt_column content --response_column summary --overwrite_cache --model_name_or_path /home/cser/hugo/ChatGLM-6B/chatglm-6b --output_dir ./output/adgen-chatglm-6b-ft-$LR --overwrite_output_dir --max_source_length 64 --max_target_length 64 --per_device_train_batch_size 32 --per_device_eval_batch_size 32 --gradient_accumulation_steps 1 --predict_with_generate --num_train_epochs 10 --logging_steps 10 --save_steps 1000 --learning_rate $LR --pre_seq_len $PRE_SEQ_LEN --fp16

可以吧，我是这样跑的四卡，关键是加上pre_seq_len走的ptuning分支。详见main.py里：

if model_args.pre_seq_len is not None: # P-tuning v2 model = model.half() model.transformer.prefix_encoder.float() else: # Finetune model = model.float()

我使用了你的方法，报错如上所示，这是为什么呢？ PRE_SEQ_LEN=128 LR=2e-2 MASTER_PORT=$(shuf -n 1 -i 10000-65535)

deepspeed --include localhost:2,3,4,5 --master_port $MASTER_PORT main.py --deepspeed deepspeed.json --do_train --train_file AdvertiseGen/train.json --test_file AdvertiseGen/dev.json --prompt_column content --response_column summary --overwrite_cache --model_name_or_path /home/cser/hugo/ChatGLM-6B/chatglm-6b --output_dir ./output/adgen-chatglm-6b-ft-$LR --overwrite_output_dir --max_source_length 64 --max_target_length 64 --per_device_train_batch_size 32 --per_device_eval_batch_size 32 --gradient_accumulation_steps 1 --predict_with_generate --num_train_epochs 10 --logging_steps 10 --save_steps 1000 --learning_rate $LR --pre_seq_len $PRE_SEQ_LEN --fp16

可以吧，我是这样跑的四卡，关键是加上pre_seq_len走的ptuning分支。详见main.py里：

if model_args.pre_seq_len is not None:

P-tuning v2

model = model.half() model.transformer.prefix_encoder.float() else:

Finetune

model = model.float()

May 27 '23 15:05 MathamPollard

详见main.py里：

我使用了你的方法，报错如上所示，这是为什么呢？ PRE_SEQ_LEN=128 LR=2e-2 MASTER_PORT=$(shuf -n 1 -i 10000-65535)

deepspeed --include localhost:2,3,4,5 --master_port $MASTER_PORT main.py --deepspeed deepspeed.json --do_train --train_file AdvertiseGen/train.json --test_file AdvertiseGen/dev.json --prompt_column content --response_column summary --overwrite_cache --model_name_or_path /home/cser/hugo/ChatGLM-6B/chatglm-6b --output_dir ./output/adgen-chatglm-6b-ft-$LR --overwrite_output_dir --max_source_length 64 --max_target_length 64 --per_device_train_batch_size 32 --per_device_eval_batch_size 32 --gradient_accumulation_steps 1 --predict_with_generate --num_train_epochs 10 --logging_steps 10 --save_steps 1000 --learning_rate $LR --pre_seq_len $PRE_SEQ_LEN --fp16

可以吧，我是这样跑的四卡，关键是加上pre_seq_len走的ptuning分支。详见main.py里：

if model_args.pre_seq_len is not None:

May 27 '23 15:05 MathamPollard

参数结尾没加 \ 吧

May 29 '23 03:05 HuuY

换行符号复制上来自动github被删掉了。你自己加一下

May 29 '23 03:05 HuuY

@jby20180901 请问这个问题您解决了吗？

Jun 17 '23 07:06 niuhuluzhihao

@HuuY 大佬，您好，看起来这是个不错的方法，想请问一下这是用deepseed模式下执行多卡训练，可以直接在原先的train.sh中进行改动吗？其他issue中好多人说直接在train.sh 中CUDA_VISIBLE_DEVICES=1,2,3写成这样的形式也是可以实现的，但是实际上我这样实现，耗时与资源远远大于单卡运行的？麻烦大佬指导一下

Jun 17 '23 07:06 niuhuluzhihao

[BUG/Help] readme文档里为啥没有多卡ptuning的教程呢，或者解释一句会有什么情况吧，这个问题困扰好多人啊

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

P-tuning v2

Finetune