structured-neural-summarization-replication
structured-neural-summarization-replication copied to clipboard
Embeddings through ReLU an unconventional decision
This is not necessarily wrong, but I want to point out that using a ReLU here is not a very common choice as far as I know. This might not hurt anything, but if it does, this could be a thing to check.
Also: you can tie the input/output embedding matrices (ie. use a single parameter instead of self.embedding.weight
and self.out
). This will reduce your vocabulary size by half and might help a bit with overfitting. Note that you would still need the bias which is included in the self.out
layer.