vits
vits copied to clipboard
Training for custom dataset
Have you or anyone tried VITS for other dataset of other language. Did it produce natural sound with high quality?. Any detail instruction to training for custom dataset. Thank you.
You can use my fork, its a work in progress. I haven't tuned any models yet but the loop works.
https://github.com/nivibilla/efficient-vits-finetuning
Thank you, i can do the training with espnet, but the output quality is not as good as expected. So I am finding a trick or any advice for proper finetuning on dataset of other language.
I'm not sure about how to train on different languages. But there are a couple finetuning repos on Chinese and Japanese. That would probably help. My fork is only for English.
You can use piper to train and infer for any number of languages (out of Box). We trained a VITS model in Hindi that sounds similar to English. We also trained on custom English in-house dataset (LJSpeech format) and good accented English. The Tonality is not accurate and more voices sound monotonous.