vits
vits copied to clipboard
Mispronounce some words and 44,1 Khz audio
- some people claim that Mispronounciation is one of the noticeable disadvantages of VITS model. I experienced the same problem too. Does anybody know what is the reason of mispronounciation?
- I used the 44,1 Khz dataset to train the model. Because the higher resolution of the data, it seems synthesized speech shows the noise more significantly. Can anybody give me some suggestions for this problem.
It can be eSpeak phonemizer problem. You can edit text preprocessing scripts to make it accept IPA phonemes directly and change them as you need.
Hi, I also suffered the mis-pronounciation issue when using Chinese phoneme as input, any update there? It seems that the trained model with LJSpeech dataset by using IPA input does not suffer the mis-pronounciation issue, or just because English is not my mother tongue that I could not notice the mis-pronouncitation badcase?