FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

deepspeed problem with train_lora.py

Open XqZeppelinhead0702 opened this issue 10 months ago • 0 comments
trafficstars

Hello, recently I've tried to fine-tune vicuna with train_lora.py and encounterd some error I failed to resolve. I try the following script

deepspeed $CODE_PATH \
    --model_name_or_path $MODEL_PATH \
    --lora_r 8 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --data_path $DATA_PATH \
    --bf16 True \
    --output_dir $OUTPUT_DIR \
    --num_train_epochs 3 \
    --per_device_train_batch_size 32 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 1200 \
    --save_total_limit 100 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 2048 \
    --q_lora True \
    --deepspeed $DS_PATH \

and the error is as follows: QQ20241230-151344 I check the previous issues, remove the deepspeed import and modify the error line as issues/1458#issuecomment-1598308288 but still encounter the similar error at line 124 in train_lora.py: QQ20241230-152031 Obviously it can not be modified in the same way because the trainer has not been announced before line 124 in the function train(), so I wonder if there's any other approach to solve that? Or maybe the version of transformers and deepspeed in my environment is not matched with the current repo?

My Enviroment: transformers==4.47.1 and deepspeed==0.16.2

XqZeppelinhead0702 avatar Dec 30 '24 07:12 XqZeppelinhead0702