FastChat
FastChat copied to clipboard
a model trained by train-lora does not have config.json
After training, I got optimizer.pt pytorch_model.bin rng_state.pth scaler.pt scheduler.pt trainer_state.json training_args.bin
A model trained by train_lora.py does not give config.json generation_config.json pytorch_model.bin.index.json
I meet with a similar problem. Instead of the config.json, which is needed for the original checkpoint, the peft model needs a adapter_config.json. Here is the command I use.
python -m fastchat.train.train_lora \
--model_name_or_path ./vicuna/vicuna-7b \
--data_path playground/data/dummy.json \
--output_dir output \
--bf16 True \
--tf32 True \
--evaluation_strategy "no" \
--lazy_preprocess True \
--save_strategy "steps" \
--save_steps 20 \
--save_total_limit 10 \
--logging_steps 1 \
--num_train_epochs 1 \
--per_device_train_batch_size 2 \
--gradient_accumulation_steps 16 \
--learning_rate 2e-5 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine"
After training, the output folder contains no adapter_config.json. For now, a workaround with one GPU is to comment line 145 to 147 in train_lora.py, and add the following:
model.save_pretrained(training_args.output_dir)
I meet with a similar problem. Instead of the config.json, which is needed for the original checkpoint, the peft model needs a adapter_config.json. Here is the command I use.
python -m fastchat.train.train_lora \ --model_name_or_path ./vicuna/vicuna-7b \ --data_path playground/data/dummy.json \ --output_dir output \ --bf16 True \ --tf32 True \ --evaluation_strategy "no" \ --lazy_preprocess True \ --save_strategy "steps" \ --save_steps 20 \ --save_total_limit 10 \ --logging_steps 1 \ --num_train_epochs 1 \ --per_device_train_batch_size 2 \ --gradient_accumulation_steps 16 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine"After training, the
outputfolder contains noadapter_config.json. For now, a workaround with one GPU is to comment line 145 to 147 intrain_lora.py, and add the following:model.save_pretrained(training_args.output_dir)
@zhengzangw I was able to run train_lora.py and it output the adapter_config.json, but I can't seem to run the model with the adapter. What command did you use to do so?