GPT-SoVITS icon indicating copy to clipboard operation
GPT-SoVITS copied to clipboard

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Results 1028 GPT-SoVITS issues
Sort by recently updated
recently updated
newest added

py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result...

In follow-up

合成的文本为“文本切分工具。太长的文本合成出来效果不一定好,所以太长建议先切。合成会根据文本的换行分开合成再拼起来。”,合成后,有时会吞字,例如把“文本切分工具。”丢了,猜测应该是分词的原因。另外,这句中的“换行”本应为“huan hang”,合成后的音频却读成了“huang xing”,是tts引擎的问题吗?

todolist

This project is great. I would like to inquire about the scale of the data used to train the model, and the quality of the data (whether it's accurately labeled...

本来就是想试试如果模型用自己的,reference用其他人的会是什么效果。结果发现似乎是做了个加法,音色是两边求平均获得了一种新的语音。 然后我自己的模型没有英文语音所以说不好英语,但是如果用声音工作者的英文演讲作为reference就可以说出很好的英文甚至中英文混合了。虽然音色不是自己的,仿佛就是音色等于自己+reference,然后语气语调使用了reference,大赞。 [例子](https://github.com/RVC-Boss/GPT-SoVITS/assets/17892787/d421cee9-5a0c-447f-a9a2-aa610676ec23)

长句切割后,根据切割后的片段,生成多个不同的音频,是否可以通过对不满意的片段进行抽卡,然后最后再合并为一个长音频。 或者添加选项,手动对长音频进行切割(就类似于subfix中对音频进行切割),然后对较差片段进行重新推理。

一次多抽,减少抽卡次数,并且方便生成的多个音频的效果进行比较

My file path is correct, double checked... but still getting below error, can someone pls help? (replica) PS D:\dev\replica\GPT-SoVITS> python webui.py Running on local URL: http://0.0.0.0:9874 "D:\conda_envs\replica\python.exe" tools/slice_audio.py "D:\dev\replica\audio\input" "output\slicer_opt"...

From what I understand, the model currently requires fine-tuning on at least 2-3 hours of speech data to produce convincing results in Japanese. Is this correct? Additionally, is it necessary...

在第二个tab中看到这个提示,很奇怪,我的显卡是4090,为何找不到呢?

在Docker镜像构建阶段自行下载moda ASR和nltk相关模型,以便加快初次运行速度。