blank_language_model
blank_language_model copied to clipboard
Share of embeddings
Hi all,
I was wondering what the benefits of sharing the word and projection weights when training a BLM model? Do you think/suggest using it as default hyper-param when training the BLM model, or we're better off fine-tuning i?
Thank you all :)