LLaMA-Factory
LLaMA-Factory copied to clipboard
多卡微调Qwen2.5-14B显存分配不均
Reminder
- [x] I have read the above rules and searched the existing issues.
System Info
environment:
llamafactory 0.9.1.dev0
liger_kernel 0.5.3
torch 2.4.0
transformers 4.44.0
flash-attn 2.7.2.post1
deepspeed 0.14.4
A100 80GB *2
yaml:
### model
model_name_or_path: /root/sdb2/Qwen2.5-14B-Instruct
### method
stage: sft
do_train: true
finetuning_type: lora
lora_target: all
lora_rank: 16
lora_alpha: 32
lora_dropout: 0.05
deepspeed: /root/LLaMA_Factory/LLaMA-Factory/examples/deepspeed/ds_z3_offload_config.json
enable_liger_kernel: True
flash_attn: fa2
### dataset
dataset: deepseek-110k
template: qwen
cutoff_len: 12800
max_samples: 27500
overwrite_cache: true
preprocessing_num_workers: 16
### output
output_dir: /root/LoRA/LoRA_deepseek110k/qwen_model/deepseek110k_finetune_qwen2.5-14b-lora_lr1e-4_r16_alpha32_ld0.05_20250224
logging_steps: 500
save_steps: 500
plot_loss: true
overwrite_output_dir: true
### train
per_device_train_batch_size: 4
gradient_accumulation_steps: 2
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.03
bf16: true
ddp_timeout: 180000000
### eval
val_size: 0.1
per_device_eval_batch_size: 4
eval_strategy: steps
eval_steps: 200
GPU memory情况:
1084564 root 1 Compute 100% 72782MiB 89% 100% 27464MiB /root/anaconda3/envs/llama-factory/bin/python -u /root/LLaMA_Factory/LLaMA-Factory/src/llamafactory/laun
1084563 root 0 Compute 99% 40408MiB 49% 101% 27486MiB /root/anaconda3/envs/llama-factory/bin/python -u /root/LLaMA_Factory/LLaMA-Factory/src/llamafactory/laun
显存总是有一张卡占用特别高,另一张卡吃一半,训练到一定时间就自动oom了,这咋解决呀~~
Reproduction
Others
No response