fairseq icon indicating copy to clipboard operation
fairseq copied to clipboard

For MMS TTS, is it possible to add pauses, emotion, inflection, ect?

Open JWesorick opened this issue 1 year ago • 1 comments

❓ Questions and Help

What is your question?

I am playing with and learning about the MMS TTS. I have it running and am curious if it is possible to adjust the output to have things like pauses, emotion, & inflection.

JWesorick avatar May 26 '23 08:05 JWesorick

The MMS TTS model (VITS) is a probabilistic model. Thus you will get a different audio each time you run (suppose the random seed is not specified). For more controllable generation (e.g., generate an utterance of a particular type of emotion), it's not supported yet. And we will incorporate that in our next release.

chevalierNoir avatar May 26 '23 21:05 chevalierNoir

I've found that in the given state pauses could be ajusted by adding spaces and apostrophes. For example try to generate: "Hello my name is Gosha" "Hello ' my name is Gosha"

Adlinga avatar Jun 20 '23 10:06 Adlinga