fix Chinese and English mixing
The original code can support pure Chinese pronunciation very well, but when it comes to mixed Chinese and English pronunciation, the English pronunciation data will be lost.
I have improved this function. However, the download_model.py is missing because there is no download address for Kokoro-82M-v1.1-zh in the release.
#214
@chai51 I'm not sure I understand what exactly this is trying to fix and how/what it fixes?
@fireblade2534
Purely a guess on my part, but I think this is a request for support for "Kokoro-82M-v1.1-zh" which is supposed to be better with at least Chinese, and which only handles Chinese and English.
When the Voice is set to zf_xiaoyi and the Language is set to Chinese, it is illustrated by the following two use cases:
"该模型是经过短期训练的结果,从专业数据集中添加了 100 名中文使用者。" The synthesized pronunciation of this text is completely accurate.
"Kokoro 是一系列体积虽小但功能强大的 TTS 模型。" In this sentence, the pronunciations of "Kokoro" and "TTS" are incorrect.
Previously, the pronunciation of mixed Chinese and English texts would lose the English part of the pronunciation. I used the new Kokoro module to deal with English and Chinese separately and solve the problem of lost English pronunciation. Since the mixed use of Chinese and English scenes is very common, this improvement will enhance the diversity of Chinese scenes.
@chai51 So what your saying is that if the language is Chinese instead of loading the normal kokoro v1 model it loads kokoro v1.1. Some questions are:
- What converts the chinese and English text to phenomes in such a way that both English pronunciation and Chinese pronunciation is maintained
- Does it load v1.1 and v1 at the same time
- Does it use the text normalization system for the English parts of text or is that skipped (Text normalization only works for English right now so it is automatically disabled if the lang code requests a different language)
@chai51 So what your saying is that if the language is Chinese instead of loading the normal kokoro v1 model it loads kokoro v1.1. Some questions are:
- What converts the chinese and English text to phenomes in such a way that both English pronunciation and Chinese pronunciation is maintained
- Does it load v1.1 and v1 at the same time
- Does it use the text normalization system for the English parts of text or is that skipped (Text normalization only works for English right now so it is automatically disabled if the lang code requests a different language)
Yeah, you're right.
- The submitted code api/src/inference/kokoro_v1.py:87 passes a callback function to
KPipeline, which separates the Chinese and English parts, api/src/inference/kokoro_v1.py:61 The English part is returned by callback function and synthesized usinga-KPipeline. Specific implementation reference make_zh.py - Only v1.1 was loaded
- Since English will only exist as words or abbreviations in scenarios where mixed pronunciation is used, the English part of the text normalization system is omitted
Hey, this is a must have fix for text with mixed Chinese and English words. When can this PR be merged? It seems there is not code conflicts.
Has the code been merged? I also have a need for a mix of Chinese and English
Could someone help to merge it? Thanks!
Does the author identify the changes trying to fix mixed Chinese and English words and plan to merge the branch? Please take time on this
@fireblade2534 is there any barrier block this PR? If not, could you please merge this PR? It's very important for scenario with mixed Chinese and English words.
So why hasn't it been merged? This is very important! 🙏
Hi @ThatCoders, this PR has some merge conflicts and haven't gotten to it yet as we work through the backlog
Hi @ThatCoders, this PR has some merge conflicts and haven't gotten to it yet as we work through the backlog
Got it, thank you for your reply and your contribution to open source.