Transformer-TTS Mutlihead attention implementation

Mutlihead attention implementation

Open hash2430 opened this issue 5 years ago • 1 comments

 # Concatenate context vector with input (most important)

    result = t.cat([decoder_input, result], dim=-1)

Excuse me. I don't think I have seen concatenating multiheads with original input when doing self-attention. Plus you commented it as important. I guess I am missing some thing? Do you mind if I ask which paper you referred to when implementing this part of multihead attention?

Jan 21 '20 05:01 hash2430

Same question here. I didn't see any reference in the transformer TTS paper https://arxiv.org/abs/1809.08895 EDIT : It might be link to "The multi-head attention can integrate the encoder hidden states in multi- ple perspectives and generate better context vectors" In section 3.6 of the paper. Not sure of my interpretation

May 28 '20 10:05 Wallart

Transformer-TTS Transformer-TTS copied to clipboard

Mutlihead attention implementation

Transformer-TTS
Transformer-TTS copied to clipboard