Lars Nieradzik
Lars Nieradzik
I noticed that the default Mandarin acoustic model uses [phone groups](https://github.com/MontrealCorpusTools/mfa-models/blob/main/config/acoustic/phone_groups/mandarin_mfa.yaml) to combine tones. Does this mean that tones have no effect on the alignment?
Add WV-MOS from https://arxiv.org/pdf/2203.13086 Code is here: https://github.com/AndreevP/wvmos/tree/main Also relevant is: https://www.arxiv.org/pdf/2407.12707 On “TTS Arena” UTMOSv1 has only a weak correlation with the leaderboard, while WVMOS has much better results....
Weight_norm is not required for the inference. Furthermore, in the latest version of Pytorch it leads to "UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.".
哟 = yo1 = /jɔ˥/ ``` >>> pinyin_to_ipa("yo1") OrderedSet([('w', 'o˥')]) ```
Hey, could you add support for erhua. Combinations such as 事儿 = shìr are not handled. Even in standard Chinese (news etc.), erhua is often heard.