sharing weight matrix between the two embedding layers and the pre-softmax linear transformation

Open nataly-obr opened this issue 3 years ago • 0 comments

Hi, thanks for your repo: helps a lot! In the paper weight matrix is shared between the two embedding layers and the pre-softmax linear transformation. "In our model, we share the same weight matrix between the two embedding layers and the pre-softmax linear transformation, similar to [30]. " (Page 5, Chapter 3.4 Embeddings and Softmax)
Would it be correct to modify in transformer_model.py the following rows to something like this: rows 32-33 -> self.src_embedding = self.trg_embedding = Embedding(src_vocab_size, model_dimension) row 50 -> self.decoder_generator = DecoderGenerator(self.src_embedding.embeddings_table.weight) row 221 -> def init(self, shared_embedding_weights): row 224 -> self.linear = nn.Linear(shared_embedding_weights.size()[1], shared_embedding_weights.size()[0], bias=False) del self.linear.weight self.shared_embedding_weights = shared_embedding_weights row 232 -> self.linear.weight = self.shared_embedding_weights row 233 -> return self.log_softmax(self.linear(trg_representations_batch) * math.sqrt(self.shared_embedding_weights.size()[1]))

Jan 19 '22 17:01 nataly-obr