tensorflow-wavenet icon indicating copy to clipboard operation
tensorflow-wavenet copied to clipboard

Text-to-speech

Open aelbialy-tbox opened this issue 7 years ago • 6 comments

Is there a way right now to have my model generated voice to say a specific word or sentence that I give it? If not, how can I implement this? Or if there any resources/repos that can help in doing that?

aelbialy-tbox avatar Jun 12 '17 09:06 aelbialy-tbox

try tacotron

greigs avatar Jun 12 '17 10:06 greigs

Not with the current implementation, this has has been discussed in the past (see #252). WaveNet is not the best for "raw" text-to-speech anyway (tacotron is indeed better), as it requires a lot of auxiliary components (the speech frontend) to make it work. If you want to have a look at how a full tts pipeline looks like, try Merlin. WaveNet is still great for other tasks, though (as a music encoder, as a time series model for other data, as a "decoder" for audio spectra...)

lemonzi avatar Jun 12 '17 17:06 lemonzi

Is that still the state of it? I know there's been a lot of Wavenet changes in the past month or so, and Google is implementing it for their voice assistant.

tibbon avatar Oct 27 '17 17:10 tibbon

If we can't give words or sentences to generate sound, when we run the generate.py , what is happening? What is the content of this voice?

burakipekk avatar Dec 12 '17 07:12 burakipekk

Most probably garbled speech (resembling some foreign language, but not forming any meaningful words in any language) similar to the samples at https://deepmind.com/blog/wavenet-generative-model-raw-audio/ in chapter "knowing what to say".

Kungergely avatar Feb 06 '18 14:02 Kungergely

I'd like to know as well, especially after that Google I/O conference.

neil-119 avatar May 11 '18 21:05 neil-119