[BUG] Hanging problem when LoRA sft Qwen2.5Omni with multi-turn video-audio samples with ds-z3 or ds-z3-offload

Open Luffy966 opened this issue 8 months ago • 0 comments

Describe the bug We‘ve discussed a lot in llamafactory community: https://github.com/hiyouga/LLaMA-Factory/issues/7767 And it's weird that this issue only happens when we use ds-z3 or ds-z3-offload.

To Reproduce

Env preparation:

# prepare transformers with the following cmd to train qwen2.5omni
pip install git+https://github.com/Kuangdd01/transformers.git@qwen25omni

# use this commit if training with mmdata in system prompt
pip install git+https://github.com/Luffy-ZY-Wang/LLaMA-Factory.git@dev_my_branch

Data preparation: Following: https://github.com/hiyouga/LLaMA-Factory/issues/7767#issuecomment-2823732424
llamafactory config:

### model
model_name_or_path: ./Qwen2.5-Omni-7B
image_max_pixels: 262144
video_max_pixels: 16384
trust_remote_code: true

### method
stage: sft
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all
deepspeed: ./examples/deepspeed/ds_z3_config.json


### dataset
dataset: mllm_mmsys_multiturn_video_audio_demo_alpaca
template: qwen2_omni
cutoff_len: 8192
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4

### output
output_dir: saves/qwen2_omni-7b-video/lora/sft
logging_steps: 1
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false

### train
use_audio_in_video: true
per_device_train_batch_size: 1
gradient_accumulation_steps: 4
freeze_vision_tower: true
learning_rate: 1.0e-4
num_train_epochs: 25.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
fp16: true
ddp_timeout: 180000000
resume_from_checkpoint: null

System info

llamafactory version: 0.9.3.dev0
Platform: Linux-5.4.0-162-generic-x86_64-with-glibc2.31
Python version: 3.12.9
PyTorch version: 2.6.0+cu118 (GPU)
Transformers version: 4.50.0.dev0
Datasets version: 3.4.1
Accelerate version: 1.5.2
PEFT version: 0.15.1
TRL version: 0.9.6
GPU type: NVIDIA GeForce RTX 4090
GPU number: 8
GPU memory: 23.65GB
DeepSpeed version: 0.16.4
vLLM version: 0.8.1

Apr 30 '25 03:04 Luffy966