TingC-95
TingC-95
https://github.com/Rayhane-mamah/Tacotron-2/blob/ab5cb08a931fc842d3892ebeb27c8b8734ddd4b8/tacotron/feeder.py#L201
相同的模型在GPU上推断没问题,但是在CPU上遇到如下报错: Traceback (most recent call last): File "/content/Retrieval-based-Voice-Conversion-WebUI/infer-web.py", line 141, in vc_single if_f0 = cpt.get("f0", 1) NameError: name 'cpt' is not defined Traceback (most recent call last): File "/usr/local/lib/python3.9/dist-packages/gradio/routes.py", line...
**Is your feature request related to a problem? Please describe.** Yes, the problem is that current models may not always provide accurate predictions, especially when dealing with complex and diverse...
Hi, I'm running evaluation.py on MNLI as described in the README, but I'm getting different results compared to what's displayed there. I'm using Google Colab for this, and you can...
I found that adjusting noise_scale_w has an effect on the smoothness of the synthesized speech When noise_scale_w is close to 1, the speech speed is slower and the speech is...
I found that VITS's MAS result is very accurate, so why not distil the duration information to train the student model?
关于学生模型
请问学生模型为啥只重用了教师模型的 enc_q 和 flow,而不重用文本编码器呢? 学生模型的tuning是更适合用同一个数据集的教师模型做transfer,还是更适合用其他学生模型做transfer呢? 训练学生模型一般多久收敛呀?
发现刚开始训练的时候,显存变化剧烈,且容易爆显存;过了一段时间之后,显存降下去且显存利用率比较低。 有大佬观察到这个现象吗?这是为什么呢?
语速问题
首先感谢开源,合成效果很棒。 我尝试模型finetune,发现合成的音频总是会比原始语速偏慢,即便是集内的数据。 后来发现这个 https://github.com/PlayVoice/vits_chinese/blob/5b662006ff016f749e6c76a15b4e8e8210a4e1cf/models.py#L562 取ceil的操作可能是原因。但是取floor又会太快。 针对这个问题是否有优化建议呢?