BERT-pytorch icon indicating copy to clipboard operation
BERT-pytorch copied to clipboard

Tie the input and output embedding?

Open jiqiujia opened this issue 6 years ago • 5 comments

I think it's reasonable to tie the input and output embedding. Especially the output embedding along each token. But I still can't get a way to do this. Any one give an idea?

jiqiujia avatar Oct 29 '18 05:10 jiqiujia

hmmm? what do you mean the output embedding? you mean the softmaxed output distribution?

codertimo avatar Oct 29 '18 05:10 codertimo

The output embedding is linear layer in MaskedLanguageModel . I made a mistake: the output embedding along each token is already shared. It should be easy to tie the input embedding and output embedding.

jiqiujia avatar Oct 29 '18 06:10 jiqiujia

Is there any benefit if we bind two layer weight? If it is, please can you let me know some references which has similar architecture?

codertimo avatar Oct 29 '18 13:10 codertimo

Here's a paper: https://arxiv.org/abs/1608.05859

With tying there is a lower memory requirement and the training should be faster (i believe).

briandw avatar Oct 29 '18 22:10 briandw

@jiqiujia @briandw Cool I'll implement is on 0.0.1a5 version, but it seems like solving #32 is more high priority

codertimo avatar Oct 30 '18 03:10 codertimo