DeepSpeed
DeepSpeed copied to clipboard
[BUG] Hanging problem when LoRA sft Qwen2.5Omni with multi-turn video-audio samples with ds-z3 or ds-z3-offload
Describe the bug We‘ve discussed a lot in llamafactory community: https://github.com/hiyouga/LLaMA-Factory/issues/7767 And it's weird that this issue only happens when we use ds-z3 or ds-z3-offload.
To Reproduce
- Env preparation:
# prepare transformers with the following cmd to train qwen2.5omni
pip install git+https://github.com/Kuangdd01/transformers.git@qwen25omni
# use this commit if training with mmdata in system prompt
pip install git+https://github.com/Luffy-ZY-Wang/LLaMA-Factory.git@dev_my_branch
- Data preparation: Following: https://github.com/hiyouga/LLaMA-Factory/issues/7767#issuecomment-2823732424
- llamafactory config:
### model
model_name_or_path: ./Qwen2.5-Omni-7B
image_max_pixels: 262144
video_max_pixels: 16384
trust_remote_code: true
### method
stage: sft
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all
deepspeed: ./examples/deepspeed/ds_z3_config.json
### dataset
dataset: mllm_mmsys_multiturn_video_audio_demo_alpaca
template: qwen2_omni
cutoff_len: 8192
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4
### output
output_dir: saves/qwen2_omni-7b-video/lora/sft
logging_steps: 1
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
### train
use_audio_in_video: true
per_device_train_batch_size: 1
gradient_accumulation_steps: 4
freeze_vision_tower: true
learning_rate: 1.0e-4
num_train_epochs: 25.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
fp16: true
ddp_timeout: 180000000
resume_from_checkpoint: null
System info
- llamafactory version: 0.9.3.dev0
- Platform: Linux-5.4.0-162-generic-x86_64-with-glibc2.31
- Python version: 3.12.9
- PyTorch version: 2.6.0+cu118 (GPU)
- Transformers version: 4.50.0.dev0
- Datasets version: 3.4.1
- Accelerate version: 1.5.2
- PEFT version: 0.15.1
- TRL version: 0.9.6
- GPU type: NVIDIA GeForce RTX 4090
- GPU number: 8
- GPU memory: 23.65GB
- DeepSpeed version: 0.16.4
- vLLM version: 0.8.1