Adriana STAN comments

Results 18 comments of


                                            Adriana STAN

[Fastpitch] Multi-speaker model changes output speaker identity for different texts

Just to improve the speaker control.

[Fastpitch] Multi-speaker model changes output speaker identity for different texts

Is it noise-noise or speech-like noise? Is the symbol list you use at training the same as the one used at inference? Is the transcription correct and aligned with the...

[Fastpitch] Multi-speaker model changes output speaker identity for different texts

I am afraid I cannot share the dataset, but I did downsample the audio, trimmed the silence and normalised the volume. Maybe you can try using one of the single-speaker...

[Fastpitch] Multi-speaker model changes output speaker identity for different texts

Thanks for your reply! We are using 50 speakers with 200 utts/speaker, and it still changes the identity. We are now retraining using the ideas here: https://github.com/NVIDIA/DeepLearningExamples/issues/707#issuecomment-727021066

[Fastpitch] Multi-speaker model changes output speaker identity for different texts

Hi, @alancucki, So we tried all the methods mentioned so far: - balancing the data - adding the speaker conditioning on the decoder side, as well - using two attention...

[Fastpitch] Multi-speaker model changes output speaker identity for different texts

But this is what happens now in FastPitch: https://github.com/NVIDIA/DeepLearningExamples/blob/afea561ecff80b82f17316a0290f6f34c486c9a5/PyTorch/SpeechSynthesis/FastPitch/fastpitch/transformer.py#L207

[Fastpitch] Multi-speaker model changes output speaker identity for different texts

Ok, got it, still this only means that instead of having equal weights in the summation, the network learns the individual summation weights. So did you try this and got...

[Fastpitch] Multi-speaker model changes output speaker identity for different texts

Ok, great, I will give it a try. Thanks!