Transformer-TTS
Transformer-TTS copied to clipboard
Mutlihead attention implementation
# Concatenate context vector with input (most important)
result = t.cat([decoder_input, result], dim=-1)
Excuse me. I don't think I have seen concatenating multiheads with original input when doing self-attention. Plus you commented it as important. I guess I am missing some thing? Do you mind if I ask which paper you referred to when implementing this part of multihead attention?
Same question here. I didn't see any reference in the transformer TTS paper https://arxiv.org/abs/1809.08895 EDIT : It might be link to "The multi-head attention can integrate the encoder hidden states in multi- ple perspectives and generate better context vectors" In section 3.6 of the paper. Not sure of my interpretation