tensorflow-wavenet
tensorflow-wavenet copied to clipboard
Text-to-speech
Is there a way right now to have my model generated voice to say a specific word or sentence that I give it? If not, how can I implement this? Or if there any resources/repos that can help in doing that?
try tacotron
Not with the current implementation, this has has been discussed in the past (see #252). WaveNet is not the best for "raw" text-to-speech anyway (tacotron is indeed better), as it requires a lot of auxiliary components (the speech frontend) to make it work. If you want to have a look at how a full tts pipeline looks like, try Merlin. WaveNet is still great for other tasks, though (as a music encoder, as a time series model for other data, as a "decoder" for audio spectra...)
Is that still the state of it? I know there's been a lot of Wavenet changes in the past month or so, and Google is implementing it for their voice assistant.
If we can't give words or sentences to generate sound, when we run the generate.py , what is happening? What is the content of this voice?
Most probably garbled speech (resembling some foreign language, but not forming any meaningful words in any language) similar to the samples at https://deepmind.com/blog/wavenet-generative-model-raw-audio/ in chapter "knowing what to say".
I'd like to know as well, especially after that Google I/O conference.