deepvoice3 icon indicating copy to clipboard operation
deepvoice3 copied to clipboard

Literature Survey TTS

Open rmalav15 opened this issue 6 years ago • 4 comments

Hi Kyubyong Park,

Much thanks for the wonderful git profile, very very helpful especially for someone like me who is very interested in TTS domain.

I was exploring all the recent work in the domain, namely Facebook's VoiceLoop and the ones you already implemented DeepVoice3, DCTTS and Tacotron. The samples provided by these works DeepVoice3, Tacotron and DCTTS are very close to (Almost) human-like. But with the current codes that you and the open source community have implemented, its slightly different picture (The final speech have robotic nature). I was more interested in going ahead with Tacotron, but after comparing your results for tacotron and dctts, I am liking DCTTS better. Can you please suggest me what should be my way to go if I want to generated human like speech for my future research? (Currently I am not looking at Facebook's VoiceLoop)

In Deepvoice3 paper it was mentioned (quote):

The WaveNet vocoder sounds more natural as the WORLD vocoder introduces various noticeable artifacts.

Can you please tell me which vocoder is best in generating human-like speech based on your experience and domain knowledge?

It will be huge help if you can provide me comparison of the difficulties in training, time taken in generation (Real time or Not) and final result of all these methods.

Apologies for choosing this inappropriate platform for such discussion and my lengthy question, I am a newbie in this domain.

Thanks in Advance,

Ram

rmalav15 avatar Feb 27 '18 15:02 rmalav15

Hi @Kyubyong

Hoping for a response. It will be great help.

Kamsamgida.

rmalav15 avatar Mar 01 '18 09:03 rmalav15

@rmalav15 Have you tried the https://github.com/r9y9/deepvoice3_pytorch implementation. It supports multi voice speaker tts and Japaneses tts. And it is better than DCTTS. And the author of that project is currently to Integrating WaveNet vocoder. see the samples https://github.com/r9y9/wavenet_vocoder

amilamad avatar Mar 04 '18 07:03 amilamad

Thank You @amilamad,

I have looked at deepvoice3 implementation of r9y9, but wasn't aware of wavenet encoder integration. So thank you. I recently watched 2-minute paper's video on wavenet's new release, So I think it will surely help me with my project.

rmalav15 avatar Mar 04 '18 21:03 rmalav15

This repo have good quality https://github.com/NVIDIA/tacotron2 But tacotron2 + waveglow is slow.

mrgloom avatar Mar 24 '19 22:03 mrgloom