tortoise-tts icon indicating copy to clipboard operation
tortoise-tts copied to clipboard

Voice / Speaker keeps changing mid sentence

Open lowl7 opened this issue 1 year ago • 6 comments

No matter what I do, I don't even have any comma or period but the voice still changes completely to a different speaker mid sentence.

Something like "Hello there are you having a great day today on this most joyfull day" will have two speakers with completely different tone and pitch.

Tortoise is pretty cool, but it's basically useless when I have to sit through the generation only for it to generate something like that with two voices.

Is there any solution?

lowl7 avatar Apr 29 '23 19:04 lowl7

I had the same , but it didnt do it a few weeks ago so Im not sure what has changed.

G-force78 avatar Apr 30 '23 11:04 G-force78

How many audio clips are you using? I had the same issue when only using 3 or 4 clips. I'm using 62 10 second clips now and it sounds pretty good.

CodexOmega avatar Apr 30 '23 16:04 CodexOmega

Yeah, some voices are better or worse than others. I have made a pull request to add more voices: https://github.com/neonbjb/tortoise-tts/pull/425

Each voice has dozens of audio clips.

Try them out yourself and tell me how you like them. They are from https://dillonbecker.itch.io/sdap

n8bot avatar Apr 30 '23 18:04 n8bot

How many audio clips are you using? I had the same issue when only using 3 or 4 clips. I'm using 62 10 second clips now and it sounds pretty good.

I've tried as low as 2 clips and as high as 30 clips, no luck... voice changes every now and then.

lowl7 avatar May 10 '23 22:05 lowl7

I've had the same problem using one of the fine tuned voiced train_kennard.

I'm generally struggling to keep the voice consistent one sentence against another, it seems to change substantially. Any ways to reduce this?

Florencehinder avatar Jun 01 '23 15:06 Florencehinder

I've had the same problem using one of the fine tuned voiced train_kennard.

I'm generally struggling to keep the voice consistent one sentence against another, it seems to change substantially. Any ways to reduce this?

i think this is just one of the limitations of the model/code as it seems to start iterating again with new metrics in a new sentence, reading from a text file seems better with stop tokens but still there is a change, or no change at all making it a monotone parrotting of words.

G-force78 avatar Jun 02 '23 08:06 G-force78