ChatGLM2-6B
ChatGLM2-6B copied to clipboard
[BUG/Help] <title>如何提高多卡下的p-tuning训练速度
Is there an existing issue for this?
- [x] I have searched the existing issues
Current Behavior
在8卡V100机器上,设置NUM_GPUS=8,使用train.sh训练,发现和1卡训练速度没啥差别,都是2个小时左右,哪里设置不对吗
PRE_SEQ_LEN=128
LR=2e-2
NUM_GPUS=1
date=$(date +"%Y%m%d%H%M")
torchrun --standalone --nnodes=1 --nproc-per-node=$NUM_GPUS main.py \
--do_train \
--train_file formated_q_a.json \
--validation_file formated_q_a.json \
--preprocessing_num_workers 10 \
--prompt_column input \
--response_column output \
--overwrite_cache \
--model_name_or_path /data/vege/llm/wenda/model/chatglm2-6b \
--output_dir /data/vege/llm/wenda/model/chatglm2-6b-ptuning-example-output/adgen-chatglm2-6b-pt-$PRE_SEQ_LEN-$LR-$date \
--overwrite_output_dir \
--max_source_length 64 \
--max_target_length 128 \
--per_device_train_batch_size 16 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 1 \
--predict_with_generate \
--max_steps 3000 \
--logging_steps 10 \
--save_steps 1000 \
--learning_rate $LR \
--pre_seq_len $PRE_SEQ_LEN \
--quantization_bit 4
Expected Behavior
No response
Steps To Reproduce
- 直接按照官方p-tuning微调操作即可
Environment
- OS:ubuntu20.04
- Python:3.10.11
- Transformers:4.29.2
- PyTorch:2.0.1
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :True
Anything else?
No response
笨蛋,你没发现epoch变大8倍了吗?