FastChat
FastChat copied to clipboard
deepspeed problem with train_lora.py
Hello, recently I've tried to fine-tune vicuna with train_lora.py and encounterd some error I failed to resolve. I try the following script
deepspeed $CODE_PATH \
--model_name_or_path $MODEL_PATH \
--lora_r 8 \
--lora_alpha 16 \
--lora_dropout 0.05 \
--data_path $DATA_PATH \
--bf16 True \
--output_dir $OUTPUT_DIR \
--num_train_epochs 3 \
--per_device_train_batch_size 32 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 1 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 1200 \
--save_total_limit 100 \
--learning_rate 2e-5 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 True \
--model_max_length 2048 \
--q_lora True \
--deepspeed $DS_PATH \
and the error is as follows:
I check the previous issues, remove the deepspeed import and modify the error line as issues/1458#issuecomment-1598308288 but still encounter the similar error at line 124 in train_lora.py:
Obviously it can not be modified in the same way because the trainer has not been announced before line 124 in the function train(), so I wonder if there's any other approach to solve that? Or maybe the version of transformers and deepspeed in my environment is not matched with the current repo?
My Enviroment: transformers==4.47.1 and deepspeed==0.16.2