mindmapper15 comments

Results 22 comments of


                                            mindmapper15

Change the code to use multi-gpu, but can not speed up the training.

I've also tried multi-gpus and had same issue with you. Then I found the biggest overhead is in GE2ELoss part. Especially get cosine similarity matrix and calculate loss part. https://github.com/HarryVolek/PyTorch_Speaker_Verification/blob/11b1d1932b0a226de9cabd8652c0c2ea1446611f/utils.py...

声音有点抖，有点沙哑

(I translated your question with google translate.) Set the parameter "voc_gen_batched" to False in your hparams.py Although batched WaveRNN is much faster than original WaveRNN, it is a trade-off feature....

声音有点抖，有点沙哑

You don't need to re-train your vocoder. voc_gen_batched is for inference only.

声音有点抖，有点沙哑

@freecui I implemented my own batched mode WaveRNN which is generating "unbatched(which means a single audio clip wasn't separated to multiple segments) multiple audio" at once. It's still slower than...

Some GTA files missing after running train_tacotron.py --force_gta

Besides that attention is not very robust for long-term sentences, the maximum number of Decoder RNN's time step is (max_mel_len // reduction_factor). Increasing the number of time steps in RNN...

Speed up quick_start.py by running it with GPU

@gerbill , @zirlman Did you set the voc_gen_batched=True in your [hparams.py](https://github.com/fatchord/WaveRNN/blob/master/hparams.py)? If so, the inference speed of WaveRNN should be fast. I got 1700 samples/sec inference speed when I set...

Speed up quick_start.py by running it with GPU

@gerbill There are so many variables what makes your synthesized audio quality worse. Less-trained TTS model, Less-trained WaveRNN model, etc. Could you uploads some more informations? (training steps of each...

Synthesized audio has noise

Usually, MOL sounds better than RAW mode when these models converged perfectly because of the quantization error on RAW mode. What about disabling batched generation mode? Did you try that?

Synthesized audio has noise

> Is there any reference about MOL training? How to coverage or how much data does training need? > Thanks. In my case, I've trained WaveRNN MOL with LJSpeech Dataset,...

The result is not deterministic when infer same input.

https://github.com/Rayhane-mamah/Tacotron-2/issues/155#issuecomment-413364857 https://github.com/ibab/tensorflow-wavenet/issues/347 Yeah, That's totally normal. Even if you use softmax rather than MoL, random sampling from softmax distribution has better results than choosing argmax.