tacotron
tacotron copied to clipboard
How to improve text input notation(Question)
For example for "blue" tacotron output voice speed is fast. What i want is slow some areas of word. For example "bluuue" instead of "blue". But when i input "bluuue" that changes pronunciation ofoutput strangely not like blue. How can i achieve that?
I think you have a few options:
-
Collect some training data with words spoken at different speeds, annotate the words with the speed, and train a model on that.
-
Train a model using a phonetic alphabet and then insert additional phonemes into the words that you want to stretch out. See the CMUDict file in this repo for an example of translating to phonemes: https://github.com/keithito/tacotron/blob/master/text/cmudict.py
-
Post-process the audio to slow down the words.