tacotron icon indicating copy to clipboard operation
tacotron copied to clipboard

change grinffin-lim algorithm to wavenet-based vocoder

Open lsq357 opened this issue 8 years ago • 7 comments

Does anyone plan to use wavenet-based vocoder, insteading of griffin-lim algoritm, which greatly increase the audio quality in the single-speaker experiment of Deep Voice 2 and Deep Voice 3. And in Deep Voice 3,both deep voice 3 + wavenet and tacotron + wavenet achieves the highest MOS of 3.78,Furthmore it is claimed that deep voice 3 + wavenet is faster to trian and faster to convege than tacotron+wavenet

lsq357 avatar Oct 17 '17 04:10 lsq357

I've been trying to use wavenet insteading of griffin-lim recently .

feiyuhong avatar Nov 03 '17 07:11 feiyuhong

@feiyuhong Finish it? Can you show me some experienment result and loss curve?

lsq357 avatar Nov 03 '17 09:11 lsq357

@feiyuhong can you share link to some code? I'm interested in porting it to pytorch!

rafaelvalle avatar Nov 17 '17 07:11 rafaelvalle

In my result, I can't get the clear voice after a week training when batch_size=32 Also, it is too slow that only train 1.6k+ step each day in my single GTX1080TI GPU, I need more GPUs.

lsq357 avatar Dec 07 '17 03:12 lsq357

@keithito Do you plan to use Wavenet Vocoder with Tacotron? There is a new repo to do this task https://github.com/r9y9/wavenet_vocoder. Hope to see the good quality sound just like the white paper did.

toannhu avatar Feb 06 '18 11:02 toannhu

@toannhu Yes, that repo looks great! I'm training right now on LJ Speech. There's some more discussion over at https://github.com/keithito/tacotron/issues/90

keithito avatar Feb 07 '18 07:02 keithito

I wonder why Griffin-Lim samples from here https://nv-adlr.github.io/WaveGlow have as good quality as other methods? Is it large number of iterations or it's just good output of tacotron?

mrgloom avatar Mar 23 '19 21:03 mrgloom