tensorflow-wavenet
tensorflow-wavenet copied to clipboard
Is it possible to give condition for any utterances?
Hi all,
There's something I'm confused with 'how to make meaningful sentences with different voices'.
From Deepmind blog, we can hear the samples of meaningful speech with different voices. So, is it done by conditioning the WaveNet somehow both locally and globally or it just conditioned only globally, on a meaningful sentence with different voices?
They condition globally to select which speaker is used, and locally to feed linguistic features from the text, coming from a TTS frontend. See, for example, #235.
They condition globally to select which speaker is used, and locally to feed linguistic features from the text, coming from a TTS frontend.
@lemonzi How to decide whether it is local condtioning or global conditioning? Instead of feeding linguistic feature, we feed features via spectrogram, how is local conditioning decided?