ChatGLM-6B icon indicating copy to clipboard operation
ChatGLM-6B copied to clipboard

[BUG/Help] readme文档里为啥没有多卡ptuning的教程呢,或者解释一句会有什么情况吧,这个问题困扰好多人啊

Open jby20180901 opened this issue 2 years ago • 10 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

Expected Behavior

No response

Steps To Reproduce

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

jby20180901 avatar May 25 '23 05:05 jby20180901

这个只能自己研究了。

cywjava avatar May 25 '23 07:05 cywjava

新项目,没那么齐全

bookug avatar May 25 '23 13:05 bookug

+1 ,好像只看到了多卡部署,没有并行训练

ztfmars avatar May 25 '23 15:05 ztfmars

PRE_SEQ_LEN=128 LR=2e-2 MASTER_PORT=$(shuf -n 1 -i 10000-65535)

deepspeed --include localhost:2,3,4,5
--master_port $MASTER_PORT main.py
--deepspeed deepspeed.json
--do_train
--train_file AdvertiseGen/train.json
--test_file AdvertiseGen/dev.json
--prompt_column content
--response_column summary
--overwrite_cache
--model_name_or_path /home/cser/hugo/ChatGLM-6B/chatglm-6b
--output_dir ./output/adgen-chatglm-6b-ft-$LR
--overwrite_output_dir
--max_source_length 64
--max_target_length 64
--per_device_train_batch_size 32
--per_device_eval_batch_size 32
--gradient_accumulation_steps 1
--predict_with_generate
--num_train_epochs 10
--logging_steps 10
--save_steps 1000
--learning_rate $LR
--pre_seq_len $PRE_SEQ_LEN
--fp16

可以吧,我是这样跑的四卡,关键是加上pre_seq_len走的ptuning分支。详见main.py里:

if model_args.pre_seq_len is not None: # P-tuning v2 model = model.half() model.transformer.prefix_encoder.float() else: # Finetune model = model.float()

HuuY avatar May 26 '23 06:05 HuuY

PRE_SEQ_LEN=128 LR=2e-2 MASTER_PORT=$(shuf -n 1 -i 10000-65535)

deepspeed --include localhost:2,3,4,5 --master_port $MASTER_PORT main.py --deepspeed deepspeed.json --do_train --train_file AdvertiseGen/train.json --test_file AdvertiseGen/dev.json --prompt_column content --response_column summary --overwrite_cache --model_name_or_path /home/cser/hugo/ChatGLM-6B/chatglm-6b --output_dir ./output/adgen-chatglm-6b-ft-$LR --overwrite_output_dir --max_source_length 64 --max_target_length 64 --per_device_train_batch_size 32 --per_device_eval_batch_size 32 --gradient_accumulation_steps 1 --predict_with_generate --num_train_epochs 10 --logging_steps 10 --save_steps 1000 --learning_rate $LR --pre_seq_len $PRE_SEQ_LEN --fp16

可以吧,我是这样跑的四卡,关键是加上pre_seq_len走的ptuning分支。详见main.py里:

if model_args.pre_seq_len is not None: # P-tuning v2 model = model.half() model.transformer.prefix_encoder.float() else: # Finetune model = model.float()

image 我使用了你的方法,报错如上所示,这是为什么呢? PRE_SEQ_LEN=128 LR=2e-2 MASTER_PORT=$(shuf -n 1 -i 10000-65535)

deepspeed --include localhost:2,3,4,5 --master_port $MASTER_PORT main.py --deepspeed deepspeed.json --do_train --train_file AdvertiseGen/train.json --test_file AdvertiseGen/dev.json --prompt_column content --response_column summary --overwrite_cache --model_name_or_path /home/cser/hugo/ChatGLM-6B/chatglm-6b --output_dir ./output/adgen-chatglm-6b-ft-$LR --overwrite_output_dir --max_source_length 64 --max_target_length 64 --per_device_train_batch_size 32 --per_device_eval_batch_size 32 --gradient_accumulation_steps 1 --predict_with_generate --num_train_epochs 10 --logging_steps 10 --save_steps 1000 --learning_rate $LR --pre_seq_len $PRE_SEQ_LEN --fp16

可以吧,我是这样跑的四卡,关键是加上pre_seq_len走的ptuning分支。详见main.py里:

if model_args.pre_seq_len is not None:

P-tuning v2

model = model.half() model.transformer.prefix_encoder.float() else:

Finetune

model = model.float()

MathamPollard avatar May 27 '23 15:05 MathamPollard

详见main.py里:

image 我使用了你的方法,报错如上所示,这是为什么呢? PRE_SEQ_LEN=128 LR=2e-2 MASTER_PORT=$(shuf -n 1 -i 10000-65535)

deepspeed --include localhost:2,3,4,5 --master_port $MASTER_PORT main.py --deepspeed deepspeed.json --do_train --train_file AdvertiseGen/train.json --test_file AdvertiseGen/dev.json --prompt_column content --response_column summary --overwrite_cache --model_name_or_path /home/cser/hugo/ChatGLM-6B/chatglm-6b --output_dir ./output/adgen-chatglm-6b-ft-$LR --overwrite_output_dir --max_source_length 64 --max_target_length 64 --per_device_train_batch_size 32 --per_device_eval_batch_size 32 --gradient_accumulation_steps 1 --predict_with_generate --num_train_epochs 10 --logging_steps 10 --save_steps 1000 --learning_rate $LR --pre_seq_len $PRE_SEQ_LEN --fp16

可以吧,我是这样跑的四卡,关键是加上pre_seq_len走的ptuning分支。详见main.py里:

if model_args.pre_seq_len is not None:

MathamPollard avatar May 27 '23 15:05 MathamPollard

参数结尾没加 \ 吧

HuuY avatar May 29 '23 03:05 HuuY

换行符号复制上来自动github被删掉了。你自己加一下

HuuY avatar May 29 '23 03:05 HuuY

@jby20180901 请问这个问题您解决了吗?

niuhuluzhihao avatar Jun 17 '23 07:06 niuhuluzhihao

@HuuY 大佬,您好,看起来这是个不错的方法,想请问一下这是用deepseed模式下执行多卡训练,可以直接在原先的train.sh中进行改动吗?其他issue中好多人说直接在train.sh 中CUDA_VISIBLE_DEVICES=1,2,3写成这样的形式也是可以实现的,但是实际上我这样实现,耗时与资源远远大于单卡运行的?麻烦大佬指导一下

niuhuluzhihao avatar Jun 17 '23 07:06 niuhuluzhihao