structured-neural-summarization-replication Embeddings through ReLU an unconventional decision

Embeddings through ReLU an unconventional decision

Open mallamanis opened this issue 6 years ago • 0 comments

This is not necessarily wrong, but I want to point out that using a ReLU here is not a very common choice as far as I know. This might not hurt anything, but if it does, this could be a thing to check.

Also: you can tie the input/output embedding matrices (ie. use a single parameter instead of self.embedding.weight and self.out). This will reduce your vocabulary size by half and might help a bit with overfitting. Note that you would still need the bias which is included in the self.out layer.

Feb 17 '19 17:02 mallamanis

structured-neural-summarization-replication structured-neural-summarization-replication copied to clipboard

Embeddings through ReLU an unconventional decision

structured-neural-summarization-replication
structured-neural-summarization-replication copied to clipboard