transformer Weight matrix sharing confusion

Weight matrix sharing confusion

Open Jerry-hyl opened this issue 1 year ago • 0 comments

In our model, we share the same weight matrix between the two embedding layers and the pre-softmax linear transformation.

Hello! I have the paper recently and find that this part was mentioned in 3.4 Embeddings and Softmax, but your code seemingly consider the embedding layer of ouput and the linear layer before softmax layer as separate ones, so I want to ask what's your consideration of this part?

Jan 31 '24 05:01 Jerry-hyl

transformer transformer copied to clipboard

Weight matrix sharing confusion

transformer
transformer copied to clipboard