BERT-pytorch Tie the input and output embedding?

Tie the input and output embedding?

Open jiqiujia opened this issue 6 years ago • 5 comments

I think it's reasonable to tie the input and output embedding. Especially the output embedding along each token. But I still can't get a way to do this. Any one give an idea?

Oct 29 '18 05:10 jiqiujia

hmmm? what do you mean the output embedding? you mean the softmaxed output distribution?

Oct 29 '18 05:10 codertimo

The output embedding is linear layer in MaskedLanguageModel . I made a mistake: the output embedding along each token is already shared. It should be easy to tie the input embedding and output embedding.

Oct 29 '18 06:10 jiqiujia

Is there any benefit if we bind two layer weight? If it is, please can you let me know some references which has similar architecture?

Oct 29 '18 13:10 codertimo

Here's a paper: https://arxiv.org/abs/1608.05859

With tying there is a lower memory requirement and the training should be faster (i believe).

Oct 29 '18 22:10 briandw

@jiqiujia @briandw Cool I'll implement is on 0.0.1a5 version, but it seems like solving #32 is more high priority

Oct 30 '18 03:10 codertimo

BERT-pytorch BERT-pytorch copied to clipboard

Tie the input and output embedding?

BERT-pytorch
BERT-pytorch copied to clipboard