tortoise-tts
tortoise-tts copied to clipboard
Is there a way for the Read.Py Script to output 3 samples instead of just 1?
The problem here is with permutations, which you might have figured out. It's not clear to me what the best algorithm is for selecting which of the split generations gets precedence for being part of the final generation. Typically you use log likelihood but that doesn't work here.
Is this the reason why it can sometimes produce different voices from sentance to sentance when using Read.PY? Is there's no way to produce consistent results for longer texts?
I'd also like to know this, as having a stable and consistent voice is very important for longer texts. If every sentence if completely different from the last it kinda makes it pointless. I assume this is why 11labs has a 'stability' setting.