attention-is-all-you-need-pytorch My question

My question

Open Messiz opened this issue 2 years ago • 2 comments

if trg_emb_prj_weight_sharing:
            # Share the weight between target word embedding & last dense layer
            self.trg_word_prj.weight = self.decoder.trg_word_emb.weight
if emb_src_trg_weight_sharing:
            self.encoder.src_word_emb.weight = self.decoder.trg_word_emb.weight

The code above want to realize weight share, but I'm confused that the embed layer and the linear layer have different shape of weight. How can this assignment work?

Mar 21 '22 07:03 Messiz

I just found the information from the doc of pytorch(in the attaced picture). It shows that for a fc = nn.Linear(d_model, n_trg_vocab), actually the shape of fc's weight is (n_trg_vocab, d_model)!

linear

Aug 01 '22 03:08 yingying123321

Thank you for your answer, but I figured it out a few days after that by myself. Thanks anyway!😂

Aug 01 '22 07:08 Messiz

attention-is-all-you-need-pytorch attention-is-all-you-need-pytorch copied to clipboard

My question

attention-is-all-you-need-pytorch
attention-is-all-you-need-pytorch copied to clipboard