glyce
glyce copied to clipboard
Question about the Training Strategy
Hi! Thanks for your nice work. I am interested in the training strategy shown in the paper,
"we first fine-tune the BERT model, then freeze BERT to fine-tune the glyph layer,and finally jointly tune both layers until convergence. "
Could you give more details? I am not sure how you start the training. Do you firstly fine-tune the BERT model via freezing glyph layer in the glyce_bert model or just fine-tune a BERT-only model and then load the weights and freeze them in the glyce_bert model to fine-tune the glyph layer? And how many epochs do you train for each stage?
Looking forward to your reply!