GPT-SoVITS icon indicating copy to clipboard operation
GPT-SoVITS copied to clipboard

Questions about the scale and quality of training data, and the possibility of releasing the training data in the future

Open hertz-pj opened this issue 1 year ago • 5 comments

This project is great. I would like to inquire about the scale of the data used to train the model, and the quality of the data (whether it's accurately labeled or converted from ASR).

Is there any plan to release the training data for public use in the future? How can I contribute data to this project?

hertz-pj avatar Jan 26 '24 07:01 hertz-pj

There are very few datasets that are required, only 1 to 2 minutes of speech datasets are required

kokomi12345 avatar Feb 14 '24 16:02 kokomi12345

Regarding the release of training data, I can only say that you can only find training data on the Internet, but you can also find a good Samaritan (such as me) to get training data!

kokomi12345 avatar Feb 14 '24 16:02 kokomi12345

Also, there may be inaccuracies in the voice markers, but with URV5 built-in, you can use the tools to improve the quality of your dry sound datasets!

kokomi12345 avatar Feb 14 '24 16:02 kokomi12345

Regarding the release of training data, I can only say that you can only find training data on the Internet, but you can also find a good Samaritan (such as me) to get training data!

Did you provide the training data used for this project? How can I obtain the data from you?

hertz-pj avatar Feb 19 '24 08:02 hertz-pj

How to say it, since you are a foreigner, I am Chinese in China, so it becomes more difficult to upload the dataset, so to specify a network disk connection, I can upload the dataset compression package, you can pick it up!

------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2024年2月19日(星期一) 下午4:13 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [RVC-Boss/GPT-SoVITS] Questions about the scale and quality of training data, and the possibility of releasing the training data in the future (Issue #215)

关于训练数据的发布,我只能说,你只能在网上找训练数据,但你也可以找好心人(比如我)来获取训练数据!

您是否提供了用于此项目的培训数据?我怎样才能从你那里得到这些数据呢?。

直接回复这封邮件,在GitHub上查看,或取消订阅. 你收到这个是因为你发表了评论。消息ID:<资源保护委员会-博思/通用技术公司-索维兹/问题/ 215 / @.***和>

kokomi12345 avatar Feb 19 '24 09:02 kokomi12345