Kokoro-FastAPI icon indicating copy to clipboard operation
Kokoro-FastAPI copied to clipboard

fix Chinese and English mixing

Open chai51 opened this issue 10 months ago • 13 comments

The original code can support pure Chinese pronunciation very well, but when it comes to mixed Chinese and English pronunciation, the English pronunciation data will be lost. I have improved this function. However, the download_model.py is missing because there is no download address for Kokoro-82M-v1.1-zh in the release. #214

chai51 avatar Mar 13 '25 07:03 chai51

@chai51 I'm not sure I understand what exactly this is trying to fix and how/what it fixes?

fireblade2534 avatar Mar 26 '25 14:03 fireblade2534

@fireblade2534

Purely a guess on my part, but I think this is a request for support for "Kokoro-82M-v1.1-zh" which is supposed to be better with at least Chinese, and which only handles Chinese and English.

RBEmerson970 avatar Mar 26 '25 14:03 RBEmerson970

When the Voice is set to zf_xiaoyi and the Language is set to Chinese, it is illustrated by the following two use cases:
"该模型是经过短期训练的结果,从专业数据集中添加了 100 名中文使用者。" The synthesized pronunciation of this text is completely accurate.
"Kokoro 是一系列体积虽小但功能强大的 TTS 模型。" In this sentence, the pronunciations of "Kokoro" and "TTS" are incorrect.
Previously, the pronunciation of mixed Chinese and English texts would lose the English part of the pronunciation. I used the new Kokoro module to deal with English and Chinese separately and solve the problem of lost English pronunciation. Since the mixed use of Chinese and English scenes is very common, this improvement will enhance the diversity of Chinese scenes.

chai51 avatar Mar 27 '25 10:03 chai51

@chai51 So what your saying is that if the language is Chinese instead of loading the normal kokoro v1 model it loads kokoro v1.1. Some questions are:

  • What converts the chinese and English text to phenomes in such a way that both English pronunciation and Chinese pronunciation is maintained
  • Does it load v1.1 and v1 at the same time
  • Does it use the text normalization system for the English parts of text or is that skipped (Text normalization only works for English right now so it is automatically disabled if the lang code requests a different language)

fireblade2534 avatar Mar 28 '25 16:03 fireblade2534

@chai51 So what your saying is that if the language is Chinese instead of loading the normal kokoro v1 model it loads kokoro v1.1. Some questions are:

  • What converts the chinese and English text to phenomes in such a way that both English pronunciation and Chinese pronunciation is maintained
  • Does it load v1.1 and v1 at the same time
  • Does it use the text normalization system for the English parts of text or is that skipped (Text normalization only works for English right now so it is automatically disabled if the lang code requests a different language)

Yeah, you're right.

  • The submitted code api/src/inference/kokoro_v1.py:87 passes a callback function to KPipeline, which separates the Chinese and English parts, api/src/inference/kokoro_v1.py:61 The English part is returned by callback function and synthesized using a-KPipeline. Specific implementation reference make_zh.py
  • Only v1.1 was loaded
  • Since English will only exist as words or abbreviations in scenarios where mixed pronunciation is used, the English part of the text normalization system is omitted

chai51 avatar Mar 31 '25 02:03 chai51

Hey, this is a must have fix for text with mixed Chinese and English words. When can this PR be merged? It seems there is not code conflicts.

thiner avatar May 15 '25 07:05 thiner

Has the code been merged? I also have a need for a mix of Chinese and English

fuyuhnag168 avatar Jun 10 '25 01:06 fuyuhnag168

Could someone help to merge it? Thanks!

happynocode avatar Jul 19 '25 20:07 happynocode

Does the author identify the changes trying to fix mixed Chinese and English words and plan to merge the branch? Please take time on this

crazyn2 avatar Jul 22 '25 03:07 crazyn2

@fireblade2534 is there any barrier block this PR? If not, could you please merge this PR? It's very important for scenario with mixed Chinese and English words.

thiner avatar Jul 22 '25 04:07 thiner

So why hasn't it been merged? This is very important! 🙏

tovarsh avatar Nov 26 '25 17:11 tovarsh

Hi @ThatCoders, this PR has some merge conflicts and haven't gotten to it yet as we work through the backlog

remsky avatar Nov 26 '25 17:11 remsky

Hi @ThatCoders, this PR has some merge conflicts and haven't gotten to it yet as we work through the backlog

Got it, thank you for your reply and your contribution to open source.

tovarsh avatar Nov 26 '25 17:11 tovarsh