FastChat
FastChat copied to clipboard
how to load checkpoint-200?
I use following command to train with my data, model_name_or_path is vicuna-7b-1.1 not llama 7B, Is there any problem with training like this? how to load save step checkpoint-200? Do I need to convert to Hugging Face format? How to convert?
WANDB_MODE=disabled CUDA_VISIBLE_DEVICES=0 nohup python fastchat/train/train_lora.py
--lora_r 8
--lora_alpha 16
--lora_dropout 0.05
--model_name_or_path /data/candowu/vicuna-7b-1.1
--data_path ./vicuna_km_data.json
--bf16 True
--output_dir output
--num_train_epochs 3
--per_device_train_batch_size 2
--per_device_eval_batch_size 2
--gradient_accumulation_steps 16
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 200
--save_total_limit 10
--learning_rate 2e-5
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 10
--tf32 True
--model_max_length 2048
--gradient_checkpointing True
--lazy_preprocess True 2>&1 >train.log &