fairseq
fairseq copied to clipboard
For MMS TTS, is it possible to add pauses, emotion, inflection, ect?
❓ Questions and Help
What is your question?
I am playing with and learning about the MMS TTS. I have it running and am curious if it is possible to adjust the output to have things like pauses, emotion, & inflection.
The MMS TTS model (VITS) is a probabilistic model. Thus you will get a different audio each time you run (suppose the random seed is not specified). For more controllable generation (e.g., generate an utterance of a particular type of emotion), it's not supported yet. And we will incorporate that in our next release.
I've found that in the given state pauses could be ajusted by adding spaces and apostrophes. For example try to generate: "Hello my name is Gosha" "Hello ' my name is Gosha"