LLaMA-Factory
LLaMA-Factory copied to clipboard
Galore 全参有监督微调 qwen1.5系列模型,均会出现RuntimeError: value cannot be converted to type at::Half without overflow
Reminder
- [X] I have read the README and searched the existing issues.
Reproduction
2024-04-02 12:34:42 Exception in thread Thread-49 (run_exp): 2024-04-02 12:34:42 Traceback (most recent call last): 2024-04-02 12:34:42 File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner 2024-04-02 12:34:42 self.run() 2024-04-02 12:34:42 File "/usr/lib/python3.10/threading.py", line 953, in run 2024-04-02 12:34:42 self._target(*self._args, **self._kwargs) 2024-04-02 12:34:42 File "/app/src/llmtuner/train/tuner.py", line 32, in run_exp 2024-04-02 12:34:42 run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks) 2024-04-02 12:34:42 File "/app/src/llmtuner/train/sft/workflow.py", line 71, in run_sft 2024-04-02 12:34:42 train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint) 2024-04-02 12:34:42 File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1780, in train 2024-04-02 12:34:42 return inner_training_loop( 2024-04-02 12:34:42 File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2118, in _inner_training_loop 2024-04-02 12:34:42 tr_loss_step = self.training_step(model, inputs) 2024-04-02 12:34:42 File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3036, in training_step 2024-04-02 12:34:42 loss = self.compute_loss(model, inputs) 2024-04-02 12:34:42 File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3059, in compute_loss 2024-04-02 12:34:42 outputs = model(**inputs) 2024-04-02 12:34:42 File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl 2024-04-02 12:34:42 return self._call_impl(*args, **kwargs) 2024-04-02 12:34:42 File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1519, in _call_impl 2024-04-02 12:34:42 return forward_call(*args, **kwargs) 2024-04-02 12:34:42 File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 822, in forward 2024-04-02 12:34:42 return model_forward(*args, **kwargs) 2024-04-02 12:34:42 File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 810, in call 2024-04-02 12:34:42 return convert_to_fp32(self.model_forward(*args, **kwargs)) 2024-04-02 12:34:42 File "/usr/local/lib/python3.10/dist-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast 2024-04-02 12:34:42 return func(*args, **kwargs) 2024-04-02 12:34:42 File "/usr/local/lib/python3.10/dist-packages/transformers/models/qwen2/modeling_qwen2.py", line 1173, in forward 2024-04-02 12:34:42 outputs = self.model( 2024-04-02 12:34:42 File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl 2024-04-02 12:34:42 return self._call_impl(*args, **kwargs) 2024-04-02 12:34:42 File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1519, in _call_impl 2024-04-02 12:34:42 return forward_call(*args, **kwargs) 2024-04-02 12:34:42 File "/usr/local/lib/python3.10/dist-packages/transformers/models/qwen2/modeling_qwen2.py", line 1020, in forward 2024-04-02 12:34:42 attention_mask = _prepare_4d_causal_attention_mask_for_sdpa( 2024-04-02 12:34:42 File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_attn_mask_utils.py", line 362, in _prepare_4d_causal_attention_mask_for_sdpa 2024-04-02 12:34:42 expanded_4d_mask = attn_mask_converter.to_4d( 2024-04-02 12:34:42 File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_attn_mask_utils.py", line 121, in to_4d 2024-04-02 12:34:42 causal_4d_mask = self._make_causal_mask( 2024-04-02 12:34:42 File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_attn_mask_utils.py", line 156, in _make_causal_mask 2024-04-02 12:34:42 mask = torch.full((tgt_len, tgt_len), torch.finfo(dtype).min, device=device) 2024-04-02 12:34:42 RuntimeError: value cannot be converted to type at::Half without overflow
Expected behavior
选择了bf16,试过了qwen1.5-1.8B|4B的基座和chat模型,均有此错误
System Info
RTX3090
Others
No response
无法复现呢,你能给出具体的库版本吗?(或者试着更新库和仓库版本)
我使用的是transformers==4.39.2
cmdline:python src/train_bash.py --stage sft --do_train --model_name_or_path /path-to/qwen/ --dataset marko1616 --template qwen --finetuning_type full --optim adamw_8bit --use_galore --galore_layerwise --galore_target mlp,self_attn --galore_rank 256 --output_dir /path-to/save --overwrite_cache --overwrite_output_dir --per_device_train_batch_size 2 --gradient_accumulation_steps 1 --lr_scheduler_type cosine --logging_steps 10 --save_steps 700 --learning_rate 1e-5 --num_train_epochs 4 --plot_loss --pure_bf16 --cutoff_len 2048 --flash_attn