FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

a model trained by train-lora does not have config.json

Open hotco87 opened this issue 2 years ago • 1 comments
trafficstars

After training, I got optimizer.pt pytorch_model.bin rng_state.pth scaler.pt scheduler.pt trainer_state.json training_args.bin

A model trained by train_lora.py does not give config.json generation_config.json pytorch_model.bin.index.json

hotco87 avatar Apr 19 '23 15:04 hotco87

I meet with a similar problem. Instead of the config.json, which is needed for the original checkpoint, the peft model needs a adapter_config.json. Here is the command I use.

python -m fastchat.train.train_lora \
   --model_name_or_path ./vicuna/vicuna-7b \
    --data_path playground/data/dummy.json \
    --output_dir output \
    --bf16 True \
    --tf32 True \
    --evaluation_strategy "no" \
    --lazy_preprocess True \
    --save_strategy "steps" \
    --save_steps 20 \
    --save_total_limit 10 \
    --logging_steps 1 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 16 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine"

After training, the output folder contains no adapter_config.json. For now, a workaround with one GPU is to comment line 145 to 147 in train_lora.py, and add the following:

model.save_pretrained(training_args.output_dir)

zhengzangw avatar Apr 24 '23 14:04 zhengzangw

I meet with a similar problem. Instead of the config.json, which is needed for the original checkpoint, the peft model needs a adapter_config.json. Here is the command I use.

python -m fastchat.train.train_lora \
   --model_name_or_path ./vicuna/vicuna-7b \
    --data_path playground/data/dummy.json \
    --output_dir output \
    --bf16 True \
    --tf32 True \
    --evaluation_strategy "no" \
    --lazy_preprocess True \
    --save_strategy "steps" \
    --save_steps 20 \
    --save_total_limit 10 \
    --logging_steps 1 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 16 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine"

After training, the output folder contains no adapter_config.json. For now, a workaround with one GPU is to comment line 145 to 147 in train_lora.py, and add the following:

model.save_pretrained(training_args.output_dir)

@zhengzangw I was able to run train_lora.py and it output the adapter_config.json, but I can't seem to run the model with the adapter. What command did you use to do so?

karan-nova avatar Jun 06 '23 14:06 karan-nova