GPT-SoVITS icon indicating copy to clipboard operation
GPT-SoVITS copied to clipboard

Japanese fine-tuning

Open Kamikadashi opened this issue 1 year ago • 4 comments

From what I understand, the model currently requires fine-tuning on at least 2-3 hours of speech data to produce convincing results in Japanese. Is this correct? Additionally, is it necessary to fine-tune only the SoVITS model, or does the GPT model require it as well?

Kamikadashi avatar Jan 27 '24 11:01 Kamikadashi

Tune both sovits and gpt is better (more similarity). But what epoch number is the best depends on experience. You can test saved weights of each epoch to choose the best one when inference.

RVC-Boss avatar Jan 27 '24 15:01 RVC-Boss

2-3 hours are sure enough.

RVC-Boss avatar Jan 27 '24 15:01 RVC-Boss

Thanks for the answer, I'll try to experiment. What does changing 文本模块学习率权重 achieve?

As I understand, less data is currently required to achieve comparable quality with Chinese. Will this improve for Japanese in the future? Is there an ETA available?

Kamikadashi avatar Jan 27 '24 15:01 Kamikadashi

文本模块学习率权重: During the fine-tuning stage, reduce the text comprehension module learning rate to prevent overfitting from causing anomalous articulation.

You can have a try 10 min Japanese fine tuning using default epoch setting.

RVC-Boss avatar Jan 28 '24 04:01 RVC-Boss