yanmtt icon indicating copy to clipboard operation
yanmtt copied to clipboard

which pre-train model should we use for fine-tuning

Open Aniruddha-JU opened this issue 2 years ago • 2 comments

I have pre-trained the IndicBART model on new monolingual data, and in the model path two models are saved 1) IndicBART and 2) IndicBART_puremodel. Now which should we use during the fine-tuning?

Aniruddha-JU avatar Aug 25 '22 08:08 Aniruddha-JU

IndicBART size is 2.4 GB and pure_model size is 932.

Aniruddha-JU avatar Aug 25 '22 08:08 Aniruddha-JU

Either.

Use the pure model with the flag --pretrained_model

Use the larger model with the flag --pretrained_model and an additional flag --no_reload_optimizer_ctr_and_scheduler

The larger checkpoint contains optimizer and scheduler states so you can resume pretraining in case of crash. During fine tuning resetting the optimizer is more common.

prajdabre avatar Aug 25 '22 09:08 prajdabre