tortoise-tts
tortoise-tts copied to clipboard
Aduio quality degradation on long sentence
Text prompt: Unions gave notice that industrial action at the Gorgon and Wheatstone facilities would begin on Sept. If no agreement is reached with Chevron on pay and working conditions. Strikes may not immediately affect production, but prolonged action increases the risk of disruption.
https://soundcloud.com/keith-hon/tortoise-tts-quality-degrade-sample
The audio quality keeps decreasing when approaching the end of sentence
is it because the high quality dataset that was used to finetune the model didn't contain many long sentences?
Yes, both that and the proportion of long sentences in the training dataset is very low.
I'm experience the same issue (the voice radically differs over the course of a long sentence). Would a good approach be to limit the size of the generated utterances to mirror the size of the "cloned" voice's sizes? The attached examples are 15 sentences apart (the first example also swaps gender during)
As a side note - Having read through some of the long-ish posts, thank you for all your (continuing) hard work on this despite some, ahem, 'moral' differences with one or two users.