Share of embeddings

Open antonio-mastropaolo opened this issue 4 years ago • 0 comments

Hi all,

I was wondering what the benefits of sharing the word and projection weights when training a BLM model? Do you think/suggest using it as default hyper-param when training the BLM model, or we're better off fine-tuning i?

Thank you all :)

Mar 19 '21 11:03 antonio-mastropaolo