LLaMA-Factory icon indicating copy to clipboard operation
LLaMA-Factory copied to clipboard

单机多卡 保存模型的时候忽然显存增加

Open DinahD967102ick opened this issue 4 months ago • 1 comments

Reminder

  • [X] I have read the README and searched the existing issues.

Reproduction

deepspeed --num_gpus 2 src/train_bash.py
--deepspeed ds_config.json
--stage sft
--do_train
--model_name_or_path /home/zhouyu/pretrained_model/llm/Yi-34B-Chat
--dataset_dir data
--dataset alpaca_gpt4_zh
--template yi
--finetuning_type lora
--lora_target all
--output_dir output/yi-34-chat
--preprocessing_num_workers 32
--overwrite_cache
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--gradient_accumulation_steps 4
--lr_scheduler_type cosine
--logging_steps 1
--save_steps 2
--eval_steps 2
--val_size 0.001
--learning_rate 1e-5
--num_train_epochs 3.0
--evaluation_strategy steps
--plot_loss
--bf16

Expected behavior

image image

System Info

No response

Others

No response

DinahD967102ick avatar Feb 13 '24 15:02 DinahD967102ick