pytorch-seq2seq Tutorial 6: PositionWiseFeedforwardLayer

Tutorial 6: PositionWiseFeedforwardLayer - fc_2 activation function

Open dryng opened this issue 4 years ago • 1 comments

Hi, thank you so much for this great tutorial!

This may be a silly question, but in PositionWiseFeedforwardLayer() why does the second linear layer (fc_2) not need an activation function?

Thank you, I really appreciate the help!

Feb 28 '21 23:02 dryng

I don't think I have a great explanation for this, but the second linear layer is more there to reshape the vectors from pf_dim back to hid_dim and not really there to apply a non-linearity, which is already applied after fc_1.

This is also true of PyTorch's official Transformer implementation, see: https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/transformer.py#L297, so it's not unique to this tutorial.

It would be interesting to see what, if any, effect adding an activation after fc_2 would have. If you do any experiments on this it would be cool if you could update me.

Mar 12 '21 13:03 bentrevett

pytorch-seq2seq pytorch-seq2seq copied to clipboard

Tutorial 6: PositionWiseFeedforwardLayer - fc_2 activation function

pytorch-seq2seq
pytorch-seq2seq copied to clipboard