mindmapper15

Results 22 comments of mindmapper15

I've also tried multi-gpus and had same issue with you. Then I found the biggest overhead is in GE2ELoss part. Especially get cosine similarity matrix and calculate loss part. https://github.com/HarryVolek/PyTorch_Speaker_Verification/blob/11b1d1932b0a226de9cabd8652c0c2ea1446611f/utils.py...

(I translated your question with google translate.) Set the parameter "voc_gen_batched" to False in your hparams.py Although batched WaveRNN is much faster than original WaveRNN, it is a trade-off feature....

You don't need to re-train your vocoder. voc_gen_batched is for inference only.

@freecui I implemented my own batched mode WaveRNN which is generating "unbatched(which means a single audio clip wasn't separated to multiple segments) multiple audio" at once. It's still slower than...

Besides that attention is not very robust for long-term sentences, the maximum number of Decoder RNN's time step is (max_mel_len // reduction_factor). Increasing the number of time steps in RNN...

@gerbill , @zirlman Did you set the voc_gen_batched=True in your [hparams.py](https://github.com/fatchord/WaveRNN/blob/master/hparams.py)? If so, the inference speed of WaveRNN should be fast. I got 1700 samples/sec inference speed when I set...

@gerbill There are so many variables what makes your synthesized audio quality worse. Less-trained TTS model, Less-trained WaveRNN model, etc. Could you uploads some more informations? (training steps of each...

Usually, MOL sounds better than RAW mode when these models converged perfectly because of the quantization error on RAW mode. What about disabling batched generation mode? Did you try that?

> Is there any reference about MOL training? How to coverage or how much data does training need? > Thanks. In my case, I've trained WaveRNN MOL with LJSpeech Dataset,...

https://github.com/Rayhane-mamah/Tacotron-2/issues/155#issuecomment-413364857 https://github.com/ibab/tensorflow-wavenet/issues/347 Yeah, That's totally normal. Even if you use softmax rather than MoL, random sampling from softmax distribution has better results than choosing argmax.