annotated-transformer Some doubts about SublayerConnection

Some doubts about SublayerConnection

Open watersounds opened this issue 2 years ago • 6 comments

According to what you wrote： “That is, the output of each sub-layer is $\mathrm{LayerNorm}(x + \mathrm{Sublayer}(x))$, where $\mathrm{Sublayer}(x)$ is the function implemented by the sub-layer itself. We apply dropout (cite) to the output of each sub-layer, before it is added to the sub-layer input and normalized.” I think the rentun value should be self.norm(x + self.dropout(sublayer(x))) rather than x + self.dropout(sublayer(self.norm(x))).

Look forward to your reply.

Sep 19 '22 06:09 watersounds

Where do we write x + self.dropout(sublayer(self.norm(x)))? That's not what the passage you quote says.

Sep 24 '22 21:09 StellaAthena

In the_annotated_transformer.py on line 357. In the function documentation it even says that the norm was moved.

Sep 24 '22 22:09 Bruising6802

I have the same question as you. The explanation can be found in #92 .

Sep 26 '22 06:09 lvXiangwei

Maybe it's best to mention this issue in the notebook, because it causes confusion for many.

Sep 26 '22 06:09 Bruising6802

Where do we write x + self.dropout(sublayer(self.norm(x)))? That's not what the passage you quote says.

https://github.com/harvardnlp/annotated-transformer/blob/debc9fd747bb2123160a98046ad1c2d4da44a567/the_annotated_transformer.py#L357

Sep 26 '22 08:09 watersounds

annotated-transformer annotated-transformer copied to clipboard

Some doubts about SublayerConnection

annotated-transformer
annotated-transformer copied to clipboard