MOSS icon indicating copy to clipboard operation
MOSS copied to clipboard

OutOfMemoryError: CUDA out of memory.

Open sk142857 opened this issue 1 year ago • 6 comments

硬件环境:RTX A5000(24GB) * 5 内存:210GB 模型:moss-moon-003-base

训练报错,提示:

OutOfMemoryError: CUDA out of memory. Tried to allocate 3.80 GiB (GPU 0; 23.69 GiB total capacity; 17.46 GiB already allocated; 850.56 MiB free; 22.16 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting 
max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

请问是参数设置,有什么问题吗?

num_machines=1
num_processes=5
machine_rank=0

accelerate launch \
	--config_file ./configs/sft.yaml \
	--num_processes $num_processes \
	--num_machines $num_machines \
	--machine_rank $machine_rank \
	--deepspeed_multinode_launcher standard finetune_moss.py \
	--model_name_or_path /root/autodl-tmp/moss/fnlp/moss-moon-003-base \
	--data_dir ./sft_data \
	--output_dir ./ckpts/moss-moon-003-sft \
	--log_dir ./train_logs/moss-moon-003-sft \
	--n_epochs 2 \
	--train_bsz_per_gpu 1 \
	--eval_bsz_per_gpu 1 \
	--learning_rate 0.000015 \
	--eval_step 200 \
	--save_step 2000

image

sk142857 avatar May 11 '23 01:05 sk142857

硬件环境:RTX 6000 ADA (48GB) *1 内存:512GB 模型:moss-moon-003-base

同样问题

lhtpluto avatar May 17 '23 03:05 lhtpluto

修改 sft.yaml,deepspeed_config:   gradient_accumulation_steps: 1   gradient_clipping: 1.0   offload_optimizer_device: cpu   offload_param_device: cpu   zero3_init_flag: true   zero3_save_16bit_model: true   zero_stage: 3

把deepspeed offload到CPU上

问题解决

硬件环境:RTX 6000 ADA (48GB) *1 内存:512GB 模型:moss-moon-003-base

正在微调中

lhtpluto avatar May 17 '23 03:05 lhtpluto

修改 sft.yaml,deepspeed_config:   gradient_accumulation_steps: 1   gradient_clipping: 1.0   offload_optimizer_device: cpu   offload_param_device: cpu   zero3_init_flag: true   zero3_save_16bit_model: true   zero_stage: 3

把deepspeed offload到CPU上

问题解决

硬件环境:RTX 6000 ADA (48GB) *1 内存:512GB 模型:moss-moon-003-base

正在微调中

你好,我想问一下在你的环境下微调,batchsize是多少呀?微调的时候内存占用的峰值是多少呀(均offload到cpu上)?

Daniel-1997 avatar May 23 '23 10:05 Daniel-1997

修改 sft.yaml,deepspeed_config:   gradient_accumulation_steps: 1   gradient_clipping: 1.0   offload_optimizer_device: cpu   offload_param_device: cpu   zero3_init_flag: true   zero3_save_16bit_model: true   zero_stage: 3 把deepspeed offload到CPU上 问题解决 硬件环境:RTX 6000 ADA (48GB) *1 内存:512GB 模型:moss-moon-003-base 正在微调中

你好,我想问一下在你的环境下微调,batchsize是多少呀?微调的时候内存占用的峰值是多少呀(均offload到cpu上)?

bs = 1 ,deepspeed offload到cpu上,需要大约290GB内存。另,deepspeed不支持多线程,会严重受限于CPU的单核性能

lhtpluto avatar May 24 '23 05:05 lhtpluto

请问推理的时候出现这样的错怎么改呀 moss_cli_demo.py RuntimeError: CUDA out of memory. Tried to allocate 576.00 MiB (GPU 0; 31.75 GiB total capacity; 30.01 GiB already allocated; 548.00 MiB free; 30.02 GiB reserved in total by PyTorch) 这时候并没有deepspeed的参数了

Cocoalate avatar Aug 16 '23 07:08 Cocoalate

请问推理的时候出现这样的错怎么改呀 moss_cli_demo.py RuntimeError: CUDA out of memory. Tried to allocate 576.00 MiB (GPU 0; 31.75 GiB total capacity; 30.01 GiB already allocated; 548.00 MiB free; 30.02 GiB reserved in total by PyTorch) 这时候并没有deepspeed的参数了 推理时用量化比如int8 int4

lhtpluto avatar Oct 30 '23 08:10 lhtpluto