yanmtt
yanmtt copied to clipboard
which pre-train model should we use for fine-tuning
I have pre-trained the IndicBART model on new monolingual data, and in the model path two models are saved 1) IndicBART and 2) IndicBART_puremodel. Now which should we use during the fine-tuning?
IndicBART size is 2.4 GB and pure_model size is 932.
Either.
Use the pure model with the flag --pretrained_model
Use the larger model with the flag --pretrained_model and an additional flag --no_reload_optimizer_ctr_and_scheduler
The larger checkpoint contains optimizer and scheduler states so you can resume pretraining in case of crash. During fine tuning resetting the optimizer is more common.