MaxMax2016

UESTC ChengDu，China Computer Vision, Speech Separation, Speech Synthesis, LLMs

Results 243 comments of


                                            MaxMax2016

Best TTS based on BERT and VITS with some Natural Speech Features Of Microsoft

恩

Best TTS based on BERT and VITS with some Natural Speech Features Of Microsoft

TTS那点标注是用来微调模型啊，https://github.com/PlayVoice/vits_chinese/issues/57

Best TTS based on BERT and VITS with some Natural Speech Features Of Microsoft

我也不太理解他那个，但是使用的BERT base+他的线性层；又额外使用了一个线性进行维度转换，嵌入到vits；这维度转换的线性层也能在VITS的训练过程中，学习到和音频对应的韵律表示吧。其实我觉得直接使用BERT base也可以，我也将进行这个实验。

> > @FanhuaandLuomu 输入为拼音的声母、韵母序列；之前由于担心插入blank，会使输入序列变成2倍长度，导致工程实现中耗时变长，从而影响首包延时以及RTF。现在补上blank，没有出现发音问题了，加上blank后首包延迟为100ms，整体rtf为0.03的样子，还好。

@15755841658 解决了吞音问题，https://github.com/PlayVoice/vits_chinese

关于Opencpop transcriptions.txt

很抱歉，那些信息是歌声合成必不可少的。

关于Opencpop transcriptions.txt

哦，那个是音色替换，是音频到音频；这个是歌词到音频；

About release models and VISinger

https://github.com/PlayVoice/VI-SVS/releases/tag/0.0.1

用标贝数据大概训练多久可以使用?

一天

模型有可能支持streaming使用吗？

whisper不支持streaming

Wondering how much promotion in noisy scene when adding perturbation

feature perturbation

‹
1
2
3
4
5
6
7
8
9
10
...
24
25
›