vits icon indicating copy to clipboard operation
vits copied to clipboard

Training for custom dataset

Open huydang2106 opened this issue 1 year ago • 4 comments

Have you or anyone tried VITS for other dataset of other language. Did it produce natural sound with high quality?. Any detail instruction to training for custom dataset. Thank you.

huydang2106 avatar Mar 01 '23 02:03 huydang2106

You can use my fork, its a work in progress. I haven't tuned any models yet but the loop works.

https://github.com/nivibilla/efficient-vits-finetuning

nivibilla avatar Apr 24 '23 17:04 nivibilla

Thank you, i can do the training with espnet, but the output quality is not as good as expected. So I am finding a trick or any advice for proper finetuning on dataset of other language.

huydang2106 avatar Apr 26 '23 07:04 huydang2106

I'm not sure about how to train on different languages. But there are a couple finetuning repos on Chinese and Japanese. That would probably help. My fork is only for English.

nivibilla avatar Apr 26 '23 08:04 nivibilla

You can use piper to train and infer for any number of languages (out of Box). We trained a VITS model in Hindi that sounds similar to English. We also trained on custom English in-house dataset (LJSpeech format) and good accented English. The Tonality is not accurate and more voices sound monotonous.

athenasaurav avatar Jun 20 '23 08:06 athenasaurav