yanmtt which pre-train model should we use for fine-tuning

which pre-train model should we use for fine-tuning

Open Aniruddha-JU opened this issue 2 years ago • 2 comments

I have pre-trained the IndicBART model on new monolingual data, and in the model path two models are saved 1) IndicBART and 2) IndicBART_puremodel. Now which should we use during the fine-tuning?

Aug 25 '22 08:08 Aniruddha-JU

IndicBART size is 2.4 GB and pure_model size is 932.

Aug 25 '22 08:08 Aniruddha-JU

Either.

Use the pure model with the flag --pretrained_model

Use the larger model with the flag --pretrained_model and an additional flag --no_reload_optimizer_ctr_and_scheduler

The larger checkpoint contains optimizer and scheduler states so you can resume pretraining in case of crash. During fine tuning resetting the optimizer is more common.

Aug 25 '22 09:08 prajdabre

yanmtt yanmtt copied to clipboard

which pre-train model should we use for fine-tuning

yanmtt
yanmtt copied to clipboard