LLaMA-Factory 多卡微调Qwen2.5-14B显存分配不均

多卡微调Qwen2.5-14B显存分配不均

Open Jimmy-L99 opened this issue 3 days ago • 2 comments

Reminder

[x] I have read the above rules and searched the existing issues.

System Info

environment：

llamafactory            0.9.1.dev0
liger_kernel            0.5.3
torch                   2.4.0
transformers            4.44.0
flash-attn              2.7.2.post1
deepspeed               0.14.4
A100 80GB *2

yaml:

### model
model_name_or_path: /root/sdb2/Qwen2.5-14B-Instruct

### method
stage: sft
do_train: true
finetuning_type: lora
lora_target: all
lora_rank: 16
lora_alpha: 32
lora_dropout: 0.05
deepspeed: /root/LLaMA_Factory/LLaMA-Factory/examples/deepspeed/ds_z3_offload_config.json
enable_liger_kernel: True
flash_attn: fa2

### dataset
dataset: deepseek-110k
template: qwen
cutoff_len: 12800
max_samples: 27500
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: /root/LoRA/LoRA_deepseek110k/qwen_model/deepseek110k_finetune_qwen2.5-14b-lora_lr1e-4_r16_alpha32_ld0.05_20250224
logging_steps: 500
save_steps: 500
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 4
gradient_accumulation_steps: 2
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.03
bf16: true
ddp_timeout: 180000000

### eval
val_size: 0.1
per_device_eval_batch_size: 4
eval_strategy: steps
eval_steps: 200

GPU memory情况：

1084564 root   1  Compute 100%  72782MiB  89%   100%  27464MiB /root/anaconda3/envs/llama-factory/bin/python -u /root/LLaMA_Factory/LLaMA-Factory/src/llamafactory/laun
1084563 root   0  Compute  99%  40408MiB  49%   101%  27486MiB /root/anaconda3/envs/llama-factory/bin/python -u /root/LLaMA_Factory/LLaMA-Factory/src/llamafactory/laun

显存总是有一张卡占用特别高，另一张卡吃一半，训练到一定时间就自动oom了，这咋解决呀~~

Reproduction

Others

No response

Feb 24 '25 15:02 Jimmy-L99

LLaMA-Factory LLaMA-Factory copied to clipboard

多卡微调Qwen2.5-14B显存分配不均

Reminder

System Info

Reproduction

Others

LLaMA-Factory
LLaMA-Factory copied to clipboard