glyce icon indicating copy to clipboard operation
glyce copied to clipboard

Question about the Training Strategy

Open cooelf opened this issue 5 years ago • 0 comments

Hi! Thanks for your nice work. I am interested in the training strategy shown in the paper,

"we first fine-tune the BERT model, then freeze BERT to fine-tune the glyph layer,and finally jointly tune both layers until convergence. "

Could you give more details? I am not sure how you start the training. Do you firstly fine-tune the BERT model via freezing glyph layer in the glyce_bert model or just fine-tune a BERT-only model and then load the weights and freeze them in the glyce_bert model to fine-tune the glyph layer? And how many epochs do you train for each stage?

Looking forward to your reply!

cooelf avatar Oct 29 '19 03:10 cooelf