gntp
gntp copied to clipboard
Why would freezing entities/predicates most of the epochs work well?
Hello,
Thank you very much for the wonderful work and the codebase! I'm also interested in NTP and am trying to build works on top of it. However I'm confused by why does GNTP work well when we freeze entities/relations for the most of the epochs (e.g. 95/100 for FB). In the appendix the paper mentioned:
"On FB122, we found it useful to pre-train rules first (95 epochs), without updating any entity or relation embeddings, and then training the entity embeddings jointly with the rules (5 epochs). This forces GNTPs to learn a good rule-based model of the domain before fine-tuning its representations."
I could imagine it may be possible to learn rule-templates with randomly-initialized entity/predicate embeddings. However, since the unification scores are calculated based on embeddings, I don't understand why freezing entity/relations 95% of the time can benefit the model. Is there an explanation of this?
Thank you very much!