Adriana STAN
Adriana STAN
Just to improve the speaker control.
Is it noise-noise or speech-like noise? Is the symbol list you use at training the same as the one used at inference? Is the transcription correct and aligned with the...
I am afraid I cannot share the dataset, but I did downsample the audio, trimmed the silence and normalised the volume. Maybe you can try using one of the single-speaker...
Thanks for your reply! We are using 50 speakers with 200 utts/speaker, and it still changes the identity. We are now retraining using the ideas here: https://github.com/NVIDIA/DeepLearningExamples/issues/707#issuecomment-727021066
Hi, @alancucki, So we tried all the methods mentioned so far: - balancing the data - adding the speaker conditioning on the decoder side, as well - using two attention...
But this is what happens now in FastPitch: https://github.com/NVIDIA/DeepLearningExamples/blob/afea561ecff80b82f17316a0290f6f34c486c9a5/PyTorch/SpeechSynthesis/FastPitch/fastpitch/transformer.py#L207
Ok, got it, still this only means that instead of having equal weights in the summation, the network learns the individual summation weights. So did you try this and got...
Ok, great, I will give it a try. Thanks!