vits_chinese icon indicating copy to clipboard operation
vits_chinese copied to clipboard

Best practice TTS based on BERT and VITS with some Natural Speech Features Of Microsoft; Support ONNX streaming out!

Results 65 vits_chinese issues
Sort by recently updated
recently updated
newest added

Hey, Is it possible to adapt this model to train on English dataset? Or should I just use normal VITS?

作者您好,我想基于您的代码,优化儿化音,可以给一些具体思路吗?谢谢

Hello MaxMax2016, Thank you for sharing your code on the improved VITS, I hope to check with you about the model behaviors when adding the bi-directional KL divergence. In this...

good first issue

数据源有什么要求,我看到项目里面使用到Pinyin 这个库,是不是只能用在普通话上?

我讲该项目移植到其他项目中,作为工具使用。 文件报错`vits_chinese\monotonic_align\__init__.py", line 3, in ` `from tools.vits_chinese.monotonic_align.core import maximum_path_c ModuleNotFoundError: No module named 'tools.vits_chinese.monotonic_align.core'` 因为迁移的问题,很多文件的导入,使用了绝对路径。 `import numpy as np import torch from tools.vits_chinese.monotonic_align.core import maximum_path_c def maximum_path(neg_cent, mask): """Cython...

您有试过加了韵律特征后在多说话人上训练嘛?我这边多说话人训练效果没有单人训练的好,单人效果非常逼真

**我按README.md 来操作,如下训练有错误,请问需要怎么处理?** ### Train download baker data: https://www.data-baker.com/data/index/TNtts/ change sample rate of waves to 16kHz, and put waves to ./data/waves put 000001-010000.txt to ./data/000001-010000.txt > python vits_prepare.py -c ./configs/bert_vits.json ###...

您好,请教一下,我的音频大概平均时长是4s左右,有25%的音频在5s 以上,最长10s,这里的segment_size设置需要变大吗?segment_size 太大GPU卡卡显存可能不够用,segment_size 在实际训练是在decoder 部分,只选取一段进行训练吗?这种对长音频的训练为了充分利用数据,需要前提先做一下截取到一个时长范围吗?

首先感谢开源,合成效果很棒。 我尝试模型finetune,发现合成的音频总是会比原始语速偏慢,即便是集内的数据。 后来发现这个 https://github.com/PlayVoice/vits_chinese/blob/5b662006ff016f749e6c76a15b4e8e8210a4e1cf/models.py#L562 取ceil的操作可能是原因。但是取floor又会太快。 针对这个问题是否有优化建议呢?

enhancement