pytorch-seq2seq
pytorch-seq2seq copied to clipboard
Tutorial 6: PositionWiseFeedforwardLayer - fc_2 activation function
Hi, thank you so much for this great tutorial!
This may be a silly question, but in PositionWiseFeedforwardLayer() why does the second linear layer (fc_2) not need an activation function?
Thank you, I really appreciate the help!
I don't think I have a great explanation for this, but the second linear layer is more there to reshape the vectors from pf_dim back to hid_dim and not really there to apply a non-linearity, which is already applied after fc_1.
This is also true of PyTorch's official Transformer implementation, see: https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/transformer.py#L297, so it's not unique to this tutorial.
It would be interesting to see what, if any, effect adding an activation after fc_2 would have. If you do any experiments on this it would be cool if you could update me.