LLaMA-Factory icon indicating copy to clipboard operation
LLaMA-Factory copied to clipboard

单机多卡qwen1.5_7b_base fsdp+qlora sft,加载模型报错RuntimeError: Only Tensors of floating point and complex dtype can require gradients

Open Julylmm opened this issue 2 months ago • 1 comments

Reminder

  • [X] I have read the README and searched the existing issues.

Reproduction

CUDA_VISIBLE_DEVICES=0,1 accelerate launch
--config_file ${cur_path}/examples/accelerate/fsdp_config.yaml
${cur_path}/src/train_bash.py
--stage sft
--do_train
--model_name_or_path $MODEL
--dataset qag_gov_gpt35_data
--dataset_dir $DATA
--template default
--finetuning_type lora
--lora_target q_proj,v_proj
--output_dir $OUTPUT_DIR
--overwrite_cache
--overwrite_output_dir
--cutoff_len 4096
--preprocessing_num_workers 16
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 4
--lr_scheduler_type cosine
--logging_steps 10
--warmup_steps 20
--save_steps 100
--eval_steps 100
--evaluation_strategy steps
--load_best_model_at_end
--learning_rate 5e-5
--num_train_epochs 3.0
--max_samples 5000
--val_size 0.05
--ddp_timeout 180000000
--quantization_bit 4
--plot_loss
--report_to tensorboard
--LOG_DIR $LOG
--fp16

报错如下: image

Expected behavior

No response

System Info

No response

Others

No response

Julylmm avatar Apr 10 '24 02:04 Julylmm