DeepSpeedExamples
DeepSpeedExamples copied to clipboard
LoRA problem:out of memory when 3b model with lora in 32G GPU with batchsize 2
deepspeed --master_port 25604 --num_gpus 1 main.py
--data_path mydata/
--data_split 0,10,0
--num_padding_at_beginning 0
--model_name_or_path bloom_3b1/
--per_device_train_batch_size 2
--per_device_eval_batch_size 2
--max_seq_len 1024
--learning_rate 9.65e-7
--weight_decay 0.1
--num_train_epochs 50
--gradient_accumulation_steps 1
--log_interval 10
--lr_scheduler_type cosine
--num_warmup_steps 0
--seed 1234
--lora_dim 8
--only_optimize_lora
--lora_module_name rwtranrsformer.h.1
--zero_stage 3
--deepspeed
--output_dir $OUTPUT
&> $OUTPUT/training.log
when I launch the code with below shell, the "out of memory" of error occur when 3b model with lora in 32G GPU with batchsize 2
I think 3b model with lora in batchsize 2 can easily run in the 32G GPU, so can you help me to solve this problem
Hi, have you solved this problem?