tortoise-tts icon indicating copy to clipboard operation
tortoise-tts copied to clipboard

Aduio quality degradation on long sentence

Open Keith-Hon opened this issue 1 year ago • 4 comments

Keith-Hon avatar Aug 29 '23 03:08 Keith-Hon

Text prompt: Unions gave notice that industrial action at the Gorgon and Wheatstone facilities would begin on Sept. If no agreement is reached with Chevron on pay and working conditions. Strikes may not immediately affect production, but prolonged action increases the risk of disruption.

https://soundcloud.com/keith-hon/tortoise-tts-quality-degrade-sample

The audio quality keeps decreasing when approaching the end of sentence

Keith-Hon avatar Aug 29 '23 03:08 Keith-Hon

is it because the high quality dataset that was used to finetune the model didn't contain many long sentences?

Keith-Hon avatar Aug 29 '23 03:08 Keith-Hon

Yes, both that and the proportion of long sentences in the training dataset is very low.

neonbjb avatar Aug 29 '23 14:08 neonbjb

I'm experience the same issue (the voice radically differs over the course of a long sentence). Would a good approach be to limit the size of the generated utterances to mirror the size of the "cloned" voice's sizes? The attached examples are 15 sentences apart (the first example also swaps gender during)

As a side note - Having read through some of the long-ish posts, thank you for all your (continuing) hard work on this despite some, ahem, 'moral' differences with one or two users.

cweaver-logitech avatar Oct 23 '23 12:10 cweaver-logitech