tortoise-tts icon indicating copy to clipboard operation
tortoise-tts copied to clipboard

Multiple speakers defined in input text

Open system1system2 opened this issue 2 years ago • 1 comments

Is it possible to define multiple speakers for different portions of the input text that you feed to read.py?

Maybe via SSML syntax or, but I'm dreaming here, with natural language inside brackets (e.g., [Tom speaks:])?

system1system2 avatar Dec 13 '22 12:12 system1system2

As far as I know this isn't an existing feature and there are no plans to implement SSML, but what you're describing is fairly-straightforward to achieve: just pre-process the text to match the speaker's utterances to their loaded voice, then generate the speech independently in the correct order and combine it (as in tortoise/read.py).

That being said, since the prompt affects the voice, there will be a lot of variation for the same speaker, which makes tortoise nearly impossible to work with for long inputs. You can change the params to make the voice more consistent, but then it also becomes bland.

aeciorc avatar Dec 14 '22 17:12 aeciorc