Thorsten-Voice
Thorsten-Voice copied to clipboard
How to make coqui thorsten voice "more fluent"
Hello,
Using coqui.ai and gruut, we have trained an example of "thorsten voice" with the provided vits recipe (~60K steps). The results is good but the rhythm of the speech is not as good as the "Thorsten voice".
See here for a comparaison: https://htmlpreview.github.io/?https://github.com/alexnanchen/tts/blob/main/examples.html
How can we improve it?
- Do we need to train for more steps?
- Are there some specific parameters to tune?
- Do we need to fine tune the model on "accelerated speech"?
Many thanks!
Hi, if I remember correctly we have trained our Coqui-VITS model up to nearly 1000k steps, but there weren't any improvements in quality neither audible nor technical (MOSNET, DNSMOS, SRMR) when stepping over the 600k mark. I suggest to continue training at least up to 300k.
Thank you!