Transformer-TTS About FFN in your code

def forward(self, input_):
    # FFN Network
    x = input_.transpose(1, 2) 
    x = self.w_2(t.relu(self.w_1(x))) 
    x = x.transpose(1, 2) 

    # residual connection
    x = x + input_ 

    # dropout
    x = self.dropout(x) 

    # layer normalization
    x = self.layer_norm(x) 

    return x

Should dropout be placed before x + input_ ?

Jul 10 '19 02:07 hhguo

Hi, I will experiment with your advice. Thank you.

Jul 10 '19 03:07 soobinseo

I experimented with your advice, but the diagonal alignment did not appear correctly in the attention plot.

Jul 11 '19 02:07 soobinseo

it's weird because the residual dropout should be placed before residual connection in Transformer.

Jul 11 '19 02:07 hhguo

In the paper : Residual Dropout: We apply dropout [27] to the output of each sub-layer, before it is added to the sub-layer input and normalized.

,,,I think it's weird too,,, I should try some more experiments with dropout.

Jul 11 '19 07:07 soobinseo

Yes. At least it shows that dropout is very important in Transformer-TTS. But it should be used in a different way. Thanks for your sharing.

On Thu, Jul 11, 2019 at 3:40 PM soobin seo [email protected] wrote:

In the paper : Residual Dropout: We apply dropout [27] to the output of each sub-layer, before it is added to the sub-layer input and normalized.

,,,I think it's weird too,,, I should try some more experiments with dropout.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/soobinseo/Transformer-TTS/issues/11?email_source=notifications&email_token=AGW7YHWCWTRVNLBUTTLQITTP63PW7A5CNFSM4H7K4SR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZVZ25Y#issuecomment-510369143, or mute the thread https://github.com/notifications/unsubscribe-auth/AGW7YHVKIZJ37CYCJDIHROTP63PW7ANCNFSM4H7K4SRQ .

Jul 11 '19 09:07 hhguo

Transformer-TTS Transformer-TTS copied to clipboard

About FFN in your code

Transformer-TTS
Transformer-TTS copied to clipboard