ChatGLM-6B icon indicating copy to clipboard operation
ChatGLM-6B copied to clipboard

[BUG/Help] Windows下使用多卡P-tuning微调,出现OOM

Open 0x0019 opened this issue 1 year ago • 1 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

Windows Server下使用多卡微调出现OOM 报错信息: OutOfMemoryError: CUDA out of memory. Tried to allocate 96.00 MiB (GPU 0; 22.50 GiB total capacity; 19.86 GiB already allocated; 0 bytes free; 19.93 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Expected Behavior

No response

Steps To Reproduce

一、错误复现 1.微调批处理文件 cd ptuning SET CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python main.py --do_train --train_file ..\answers.json --validation_file ..\dev.json --prompt_column prompt --response_column response --overwrite_cache --model_name_or_path ..\model --output_dir ..\output --overwrite_output_dir --max_source_length 256 --max_target_length 256 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 16 --predict_with_generate --max_steps 500 --logging_steps 10 --save_steps 50 --learning_rate 2e-2 --pre_seq_len 128 pause 2.训练数据报错 Running tokenizer on train dataset 100%完成,inputs训练集内容后,出现OOM报错。 报错信息: OutOfMemoryError: CUDA out of memory. Tried to allocate 96.00 MiB (GPU 0; 22.50 GiB total capacity; 19.86 GiB already allocated; 0 bytes free; 19.93 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 二、排查 1.微调批处理文件 cd ptuning SET CUDA_VISIBLE_DEVICES=0 python main.py --do_train --train_file ..\answers.json --validation_file ..\dev.json --prompt_column prompt --response_column response --overwrite_cache --model_name_or_path ..\model --output_dir ..\output --overwrite_output_dir --max_source_length 256 --max_target_length 256 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 16 --predict_with_generate --max_steps 500 --logging_steps 10 --save_steps 50 --learning_rate 2e-2 --pre_seq_len 128 pause

训练正常,未出现OOM,就是慢点

Environment

- OS:Windows Server 2019
- Python:3.10.9
- Transformers:4.27.1
- PyTorch:2.0.0+cu118
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :True

其它信息
-Cuda版本:11.8
-显卡:Tesla P40*8
-是否使用Anaconda:否
-训练集大小:34.7MB

Anything else?

No response

0x0019 avatar May 04 '23 11:05 0x0019

用deepspeed启动多卡模式

zhanshijinwat avatar May 06 '23 14:05 zhanshijinwat

a100 80G*2,OOM deepspeed --num_gpus=2 --master_port $MASTER_PORT main.py
--deepspeed deepspeed.json
--do_train
--train_file ../data/2w.csv
--test_file ../data/2k.csv
--prompt_column prompts
--response_column output
--overwrite_cache
--model_name_or_path ../chatglm-6b
--output_dir ./output/xw-chatglm-6b-ft-$LR
--overwrite_output_dir
--max_source_length 64
--max_target_length 64
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 1
--predict_with_generate
--max_steps 10000
--logging_steps 100
--save_steps 5000
--learning_rate $LR
--fp16

用deepspeed启动多卡模式

xiamaozi11 avatar May 08 '23 12:05 xiamaozi11

已解决,感谢两位 @zhanshijinwat @xiamaozi11

0x0019 avatar May 10 '23 11:05 0x0019

已解决,感谢两位 @zhanshijinwat @xiamaozi11

大佬是在windows下使用deepspeed训成功的吗?

Tongjilibo avatar Jun 10 '23 08:06 Tongjilibo

@0x0019 @zhanshijinwat @xiamaozi11 大佬们,请问使用了deepspeed还是p-tunning微调吗?这是执行的是ds_train_finetune.sh文件吗?本人小白,麻烦大佬们指导一下,谢谢

niuhuluzhihao avatar Jun 16 '23 15:06 niuhuluzhihao