Adriana STAN

Results 18 comments of Adriana STAN

Is it noise-noise or speech-like noise? Is the symbol list you use at training the same as the one used at inference? Is the transcription correct and aligned with the...

I am afraid I cannot share the dataset, but I did downsample the audio, trimmed the silence and normalised the volume. Maybe you can try using one of the single-speaker...

Thanks for your reply! We are using 50 speakers with 200 utts/speaker, and it still changes the identity. We are now retraining using the ideas here: https://github.com/NVIDIA/DeepLearningExamples/issues/707#issuecomment-727021066

Hi, @alancucki, So we tried all the methods mentioned so far: - balancing the data - adding the speaker conditioning on the decoder side, as well - using two attention...

But this is what happens now in FastPitch: https://github.com/NVIDIA/DeepLearningExamples/blob/afea561ecff80b82f17316a0290f6f34c486c9a5/PyTorch/SpeechSynthesis/FastPitch/fastpitch/transformer.py#L207

Ok, got it, still this only means that instead of having equal weights in the summation, the network learns the individual summation weights. So did you try this and got...