Transformer-TTS
Transformer-TTS copied to clipboard
About FFN in your code
def forward(self, input_):
# FFN Network
x = input_.transpose(1, 2)
x = self.w_2(t.relu(self.w_1(x)))
x = x.transpose(1, 2)
# residual connection
x = x + input_
# dropout
x = self.dropout(x)
# layer normalization
x = self.layer_norm(x)
return x
Should dropout be placed before x + input_ ?
Hi, I will experiment with your advice. Thank you.
I experimented with your advice, but the diagonal alignment did not appear correctly in the attention plot.
it's weird because the residual dropout should be placed before residual connection in Transformer.
In the paper : Residual Dropout: We apply dropout [27] to the output of each sub-layer, before it is added to the sub-layer input and normalized.
,,,I think it's weird too,,, I should try some more experiments with dropout.
Yes. At least it shows that dropout is very important in Transformer-TTS. But it should be used in a different way. Thanks for your sharing.
On Thu, Jul 11, 2019 at 3:40 PM soobin seo [email protected] wrote:
In the paper : Residual Dropout: We apply dropout [27] to the output of each sub-layer, before it is added to the sub-layer input and normalized.
,,,I think it's weird too,,, I should try some more experiments with dropout.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/soobinseo/Transformer-TTS/issues/11?email_source=notifications&email_token=AGW7YHWCWTRVNLBUTTLQITTP63PW7A5CNFSM4H7K4SR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZVZ25Y#issuecomment-510369143, or mute the thread https://github.com/notifications/unsubscribe-auth/AGW7YHVKIZJ37CYCJDIHROTP63PW7ANCNFSM4H7K4SRQ .