annotated-transformer icon indicating copy to clipboard operation
annotated-transformer copied to clipboard

Some doubts about SublayerConnection

Open watersounds opened this issue 2 years ago • 6 comments

According to what you wrote: “That is, the output of each sub-layer is $\mathrm{LayerNorm}(x + \mathrm{Sublayer}(x))$, where $\mathrm{Sublayer}(x)$ is the function implemented by the sub-layer itself. We apply dropout (cite) to the output of each sub-layer, before it is added to the sub-layer input and normalized.” I think the rentun value should be self.norm(x + self.dropout(sublayer(x))) rather than x + self.dropout(sublayer(self.norm(x))).

Look forward to your reply.

watersounds avatar Sep 19 '22 06:09 watersounds

Where do we write x + self.dropout(sublayer(self.norm(x)))? That's not what the passage you quote says.

StellaAthena avatar Sep 24 '22 21:09 StellaAthena

In the_annotated_transformer.py on line 357. In the function documentation it even says that the norm was moved.

Bruising6802 avatar Sep 24 '22 22:09 Bruising6802

I have the same question as you. The explanation can be found in #92 .

lvXiangwei avatar Sep 26 '22 06:09 lvXiangwei

Maybe it's best to mention this issue in the notebook, because it causes confusion for many.

Bruising6802 avatar Sep 26 '22 06:09 Bruising6802

Where do we write x + self.dropout(sublayer(self.norm(x)))? That's not what the passage you quote says.

https://github.com/harvardnlp/annotated-transformer/blob/debc9fd747bb2123160a98046ad1c2d4da44a567/the_annotated_transformer.py#L357

watersounds avatar Sep 26 '22 08:09 watersounds