Qwen2.5-VL full sft dtype error
Reminder
- [x] I have read the above rules and searched the existing issues.
System Info
-
llamafactoryversion: 0.9.2.dev0 - Platform: Linux-3.10.0-1160.el7.x86_64-x86_64-with-glibc2.35
- Python version: 3.10.12
- PyTorch version: 2.5.1+cu124 (GPU)
- Transformers version: 4.49.0.dev0
- Datasets version: 3.2.0
- Accelerate version: 1.2.1
- PEFT version: 0.12.0
- TRL version: 0.9.6
- GPU type: NVIDIA A800-SXM4-80GB
- DeepSpeed version: 0.16.2
- vLLM version: 0.6.5
Reproduction
Training script
### model
model_name_or_path: /model/base/qwen/Qwen2.5-VL-7B-Instruct
### method
stage: sft
do_train: true
finetuning_type: full
freeze_vision_tower: true # choices: [true, false]
train_mm_proj_only: false # choices: [true, false]
deepspeed: examples/deepspeed/ds_z3_config.json # choices: [ds_z0_config.json, ds_z2_config.json, ds_z3_config.json]
flash_attn: fa2
### dataset
dataset: longwriter-v-10k
template: qwen2_vl
cutoff_len: 32768
# max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 8
### output
output_dir: /model/trained/qwen/qwen2.5_vl-7b=
logging_steps: 1
save_steps: 100
plot_loss: true
overwrite_output_dir: true
### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 2
learning_rate: 1.0e-5
num_train_epochs: 3
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
### eval
# val_size: 0.001
# per_device_eval_batch_size: 1
# eval_strategy: steps
# eval_steps: 100
Error message:
[rank0]: Traceback (most recent call last):
[rank0]: File "/app/src/llamafactory/launcher.py", line 23, in <module>
[rank0]: launch()
[rank0]: File "/app/src/llamafactory/launcher.py", line 19, in launch
[rank0]: run_exp()
[rank0]: File "/app/src/llamafactory/train/tuner.py", line 92, in run_exp
[rank0]: _training_function(config={"args": args, "callbacks": callbacks})
[rank0]: File "/app/src/llamafactory/train/tuner.py", line 66, in _training_function
[rank0]: run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
[rank0]: File "/app/src/llamafactory/train/sft/workflow.py", line 101, in run_sft
[rank0]: train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2184, in train
[rank0]: return inner_training_loop(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2490, in _inner_training_loop
[rank0]: tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3598, in training_step
[rank0]: loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3659, in compute_loss
[rank0]: outputs = model(**inputs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 18, in wrapped_fn
[rank0]: ret_val = func(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1914, in forward
[rank0]: loss = self.module(*inputs, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank0]: return inner()
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1790, in inner
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 1739, in forward
[rank0]: image_embeds = self.visual(pixel_values, grid_thw=image_grid_thw)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank0]: return inner()
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1790, in inner
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 496, in forward
[rank0]: hidden_states = self._gradient_checkpointing_func(
[rank0]: File "/app/src/llamafactory/model/model_utils/checkpointing.py", line 93, in custom_gradient_checkpointing_func
[rank0]: return gradient_checkpointing_func(func, *args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_compile.py", line 32, in inner
[rank0]: return disable_fn(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
[rank0]: return fn(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 489, in checkpoint
[rank0]: return CheckpointFunction.apply(function, preserve, *args)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 575, in apply
[rank0]: return super().apply(*args, **kwargs) # type: ignore[misc]
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 264, in forward
[rank0]: outputs = run_function(*args)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank0]: return inner()
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1790, in inner
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 296, in forward
[rank0]: hidden_states = hidden_states + self.attn(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank0]: return inner()
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1790, in inner
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 185, in forward
[rank0]: q = apply_rotary_pos_emb_flashatt(q.unsqueeze(0), rotary_pos_emb).squeeze(0)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 166, in apply_rotary_pos_emb_flashatt
[rank0]: output = apply_rotary_emb(tensor_, cos, sin).type_as(tensor)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/flash_attn/layers/rotary.py", line 122, in apply_rotary_emb
[rank0]: return ApplyRotaryEmb.apply(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 575, in apply
[rank0]: return super().apply(*args, **kwargs) # type: ignore[misc]
[rank0]: File "/usr/local/lib/python3.10/dist-packages/flash_attn/layers/rotary.py", line 48, in forward
[rank0]: out = apply_rotary(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/flash_attn/ops/triton/rotary.py", line 176, in apply_rotary
[rank0]: x.dtype == cos.dtype
[rank0]: AssertionError: Input and cos/sin must have the same dtype, got torch.float32 and torch.bfloat16
Others
No response
Related issue: https://github.com/QwenLM/Qwen2.5-VL/issues/706
Thank you for your feedback. We have submitted a PR to address this issue.
用了最新的 transformes 版本626666c,问题依然存在
fixed in https://github.com/huggingface/transformers/pull/36188
v4.49.0 还没有 fix 这里 (git checkout 到 v4.49.0 看下就知道了),我也遇到了这个问题,transformers==4.49.0 会报错,要
pip install git+https://github.com/huggingface/transformers 这样安装,然后就可以正常训练了。
这个bug是在 commit 8ee5053 中修复的 pull/36188
确实,要在 v4.50.0dev 才能行