Thorsten-Voice How to make coqui thorsten voice "more fluent"

How to make coqui thorsten voice "more fluent"

Open alexnanchen opened this issue 11 months ago • 2 comments

Hello,

Using coqui.ai and gruut, we have trained an example of "thorsten voice" with the provided vits recipe (~60K steps). The results is good but the rhythm of the speech is not as good as the "Thorsten voice".

See here for a comparaison: https://htmlpreview.github.io/?https://github.com/alexnanchen/tts/blob/main/examples.html

How can we improve it?

Do we need to train for more steps?
Are there some specific parameters to tune?
Do we need to fine tune the model on "accelerated speech"?

Many thanks!

Mar 19 '24 09:03 alexnanchen

Hi, if I remember correctly we have trained our Coqui-VITS model up to nearly 1000k steps, but there weren't any improvements in quality neither audible nor technical (MOSNET, DNSMOS, SRMR) when stepping over the 600k mark. I suggest to continue training at least up to 300k.

Mar 19 '24 10:03 domcross

Thank you!

Mar 20 '24 14:03 alexnanchen

Thorsten-Voice Thorsten-Voice copied to clipboard

How to make coqui thorsten voice "more fluent"

Thorsten-Voice
Thorsten-Voice copied to clipboard