Thorsten-Voice icon indicating copy to clipboard operation
Thorsten-Voice copied to clipboard

How to make coqui thorsten voice "more fluent"

Open alexnanchen opened this issue 11 months ago • 2 comments

Hello,

Using coqui.ai and gruut, we have trained an example of "thorsten voice" with the provided vits recipe (~60K steps). The results is good but the rhythm of the speech is not as good as the "Thorsten voice".

See here for a comparaison: https://htmlpreview.github.io/?https://github.com/alexnanchen/tts/blob/main/examples.html

How can we improve it?

  • Do we need to train for more steps?
  • Are there some specific parameters to tune?
  • Do we need to fine tune the model on "accelerated speech"?

Many thanks!

alexnanchen avatar Mar 19 '24 09:03 alexnanchen

Hi, if I remember correctly we have trained our Coqui-VITS model up to nearly 1000k steps, but there weren't any improvements in quality neither audible nor technical (MOSNET, DNSMOS, SRMR) when stepping over the 600k mark. I suggest to continue training at least up to 300k.

domcross avatar Mar 19 '24 10:03 domcross

Thank you!

alexnanchen avatar Mar 20 '24 14:03 alexnanchen