DeepLearningExamples
DeepLearningExamples copied to clipboard
FastPitch - attention not conditioned on speaker embedding
Hello. Why the attention doesn't use speaker embeddings to find the alignment between text and mel spectrograms? This can vary greatly between speakers speaking in different styles.