Kokoro-FastAPI icon indicating copy to clipboard operation
Kokoro-FastAPI copied to clipboard

How can I switch the model to hexgrad/Kokoro-82M-v1.1-zh? What should I do?

Open zhy844694805 opened this issue 9 months ago • 16 comments

How can I switch the model to hexgrad/Kokoro-82M-v1.1-zh? What should I do?

zhy844694805 avatar Mar 03 '25 16:03 zhy844694805

Wait until it's out of beta. They removed several voices for now until it's production ready if I understand correctly. Is there a particular reason you want to use this version? Does it have any exciting features I don't know about yet??

gitchat1 avatar Mar 04 '25 14:03 gitchat1

1.获取资源

sudo apt install espeak-ng

git lfs install
cd api/src/models
git clone https://huggingface.co/hexgrad/Kokoro-82M-v1.1-zh
mv Kokoro-82M-v1.1-zh v1_1-zh

cp -r v1_1-zh/voices ../voices/v1_1-zh

2.修改代码

default_voice: str = "af_heart"

改成zf_094音频样本,使kokoro使用中文加载

        "v1_0/kokoro-v1_0.pth", description="PyTorch Kokoro V1 model filename"

改为"v1_1-zh/kokoro-v1_1-zh.pth", description="PyTorch Kokoro V1 model filename"

    if language not in lang_map:
        raise ValueError(f"Unsupported language code: {language}")

    return EspeakBackend(lang_map[language])
    

改为return EspeakBackend("cmn")

3.修改启动脚本

export VOICES_DIR=src/voices/v1_1-zh

4.开始使用

访问 http://localhost:8880/web/ Search voices...选择z开头的均可 LanguageAuto改为Chinese

chai51 avatar Mar 10 '25 09:03 chai51

1.获取资源

sudo apt install espeak-ng

git lfs install cd api/src/models git clone https://huggingface.co/hexgrad/Kokoro-82M-v1.1-zh mv Kokoro-82M-v1.1-zh v1_1-zh

cp -r v1_1-zh/voices ../voices/v1_1-zh 2.修改代码

default_voice: str = "af_heart" 改成zf_094音频样本,使kokoro使用中文加载

    "v1_0/kokoro-v1_0.pth", description="PyTorch Kokoro V1 model filename"

改为"v1_1-zh/kokoro-v1_1-zh.pth", description="PyTorch Kokoro V1 model filename"

if language not in lang_map:
    raise ValueError(f"Unsupported language code: {language}")

return EspeakBackend(lang_map[language])

改为return EspeakBackend("cmn")

3.修改启动脚本

export VOICES_DIR=src/voices/v1_1-zh

4.开始使用

访问 http://localhost:8880/web/ Search voices...选择_z_开头的均可 Language将Auto改为Chinese

return EspeakBackend(lang_map[language])这一句在哪个文件?

luoxxib avatar Mar 10 '25 15:03 luoxxib

1.获取资源

sudo apt install espeak-ng

git lfs install cd api/src/models git clone https://huggingface.co/hexgrad/Kokoro-82M-v1.1-zh mv Kokoro-82M-v1.1-zh v1_1-zh

cp -r v1_1-zh/voices ../voices/v1_1-zh 2.修改代码

default_voice: str = "af_heart" 改成zf_094音频样本,使kokoro使用中文加载

    "v1_0/kokoro-v1_0.pth", description="PyTorch Kokoro V1 model filename"

改为"v1_1-zh/kokoro-v1_1-zh.pth", description="PyTorch Kokoro V1 model filename"

if language not in lang_map:
    raise ValueError(f"Unsupported language code: {language}")

return EspeakBackend(lang_map[language])

改为return EspeakBackend("cmn")

3.修改启动脚本

export VOICES_DIR=src/voices/v1_1-zh

4.开始使用

访问 http://localhost:8880/web/ Search voices...选择_z_开头的均可 Language将Auto改为Chinese

按你的步骤改了,发出来的音调不对

luoxxib avatar Mar 10 '25 15:03 luoxxib

将github原工程按照这个步骤来,首先这个可能会导致你原有的环境不可用

1.获取资源

sudo apt install espeak-ng

git lfs install
cd api/src/models
git clone https://huggingface.co/hexgrad/Kokoro-82M-v1.1-zh
mv Kokoro-82M-v1.1-zh v1_1-zh

cp -r v1_1-zh/voices ../voices/v1_1-zh

pip uninstall kokoro
pip install kokoro
pip install misaki[zh]

2.修改代码

api/src/core/config.py

    allow_local_voice_saving: bool = (
        False  # Whether to allow saving combined voices locally
    )

改为

    allow_local_voice_saving: bool = (
        False  # Whether to allow saving combined voices locally
    )
    repo_id: str = "hexgrad/Kokoro-82M"

api/src/core/model_config.py

    # Model filename
    pytorch_kokoro_v1_file: str = Field(
        "v1_0/kokoro-v1_0.pth", description="PyTorch Kokoro V1 model filename"
    )

改为

    # Model filename
    pytorch_kokoro_v1_file: str = Field(
        "v1_1-zh/kokoro-v1_1-zh.pth", description="PyTorch Kokoro V1 model filename"
    )

api/src/inference/kokoro_v1.py

            # 第一块
            self._model = KModel(config=config_path, model=model_path).eval()

            # 第二块
            self._pipelines[lang_code] = KPipeline(
                lang_code=lang_code, model=self._model, device=self._device
            )

改为

            # 第一块
            self._model = KModel(config=config_path, model=model_path, repo_id=settings.repo_id).eval()


            # 第二块
            self._pipelines[lang_code] = KPipeline(
                lang_code=lang_code, model=self._model, device=self._device, repo_id=settings.repo_id
            )

api/src/inference/model_manager.py

                warmup_text = "Warmup text for initialization."

改为

                warmup_text = "初始化的预热文本。"

api/src/services/text_processing/phonemizer.py

    if language not in lang_map:
        raise ValueError(f"Unsupported language code: {language}")

    return EspeakBackend(lang_map[language])
    

改为

    return EspeakBackend("cmn")

3.修改启动脚本

export VOICES_DIR=src/voices/v1_1-zh
export DEFAULT_VOICE=zf_094
export REPO_ID=hexgrad/Kokoro-82M-v1.1-zh

4.开始使用

访问 http://localhost:8880/web/
Search voices选择 z 开头的均可
LanguageAuto改为Chinese

chai51 avatar Mar 11 '25 05:03 chai51

将github原工程按照这个步骤来,首先这个可能会导致你原有的环境不可用

1.获取资源

sudo apt install espeak-ng

git lfs install cd api/src/models git clone https://huggingface.co/hexgrad/Kokoro-82M-v1.1-zh mv Kokoro-82M-v1.1-zh v1_1-zh

cp -r v1_1-zh/voices ../voices/v1_1-zh

pip uninstall kokoro pip install kokoro pip install misaki[zh]

2.修改代码

api/src/core/config.py

allow_local_voice_saving: bool = (
    False  # Whether to allow saving combined voices locally
)

改为

allow_local_voice_saving: bool = (
    False  # Whether to allow saving combined voices locally
)
repo_id: str = "hexgrad/Kokoro-82M"

api/src/core/model_config.py

# Model filename
pytorch_kokoro_v1_file: str = Field(
    "v1_0/kokoro-v1_0.pth", description="PyTorch Kokoro V1 model filename"
)

改为

# Model filename
pytorch_kokoro_v1_file: str = Field(
    "v1_1-zh/kokoro-v1_1-zh.pth", description="PyTorch Kokoro V1 model filename"
)

api/src/inference/kokoro_v1.py

        # 第一块
        self._model = KModel(config=config_path, model=model_path).eval()

        # 第二块
        self._pipelines[lang_code] = KPipeline(
            lang_code=lang_code, model=self._model, device=self._device
        )

改为

        # 第一块
        self._model = KModel(config=config_path, model=model_path, repo_id=settings.repo_id).eval()


        # 第二块
        self._pipelines[lang_code] = KPipeline(
            lang_code=lang_code, model=self._model, device=self._device, repo_id=settings.repo_id
        )

api/src/inference/model_manager.py

            warmup_text = "Warmup text for initialization."

改为

            warmup_text = "初始化的预热文本。"

api/src/services/text_processing/phonemizer.py

if language not in lang_map:
    raise ValueError(f"Unsupported language code: {language}")

return EspeakBackend(lang_map[language])

改为

return EspeakBackend("cmn")

3.修改启动脚本

export VOICES_DIR=src/voices/v1_1-zh export DEFAULT_VOICE=zf_094 export REPO_ID=hexgrad/Kokoro-82M-v1.1-zh

4.开始使用

访问 http://localhost:8880/web/ Search voices选择 z 开头的均可 LanguageAuto改为Chinese

谢谢,按照你的这个步骤,正常发音了

luoxxib avatar Mar 11 '25 10:03 luoxxib

@chai51 刚在玩这个,就看到你的提交了, 我cherry-pick 你的commit 后还需要做什么, 能否介绍一下直接拉代码之后要做哪些事情?

我用的mac ,尤其于某些原因不能使用docker,所以才用了direct run 的方式。
download_model.py 跑不了, 所以我手动下载1.0 模型 kokoro-v1_0.pth到了 Kokoro-FastAPI/api/src/models/v1_0 我没有装 sudo apt install espeak-ng, 因为我使用的是mac, 这个好像不是很必要,我不太清楚

fastfading avatar Mar 13 '25 08:03 fastfading

@chai51 刚在玩这个,就看到你的提交了, 我cherry-pick 你的commit 后还需要做什么, 能否介绍一下直接拉代码之后要做哪些事情?

我用的mac ,尤其于某些原因不能使用docker,所以才用了direct run 的方式。 download_model.py 跑不了, 所以我手动下载1.0 模型 kokoro-v1_0.pth到了 Kokoro-FastAPI/api/src/models/v1_0 我没有装 sudo apt install espeak-ng, 因为我使用的是mac, 这个好像不是很必要,我不太清楚

espeak-ng你可以问下deepseek是什么作用,使用我的commit后,步骤1需要将huggingface上下载的资源放到对应的位置,步骤3的变量名改变了,具体可以看start-gpu.sh里面新增的注释。差不多就可以了,如果还有什么问题,参考上面步骤,适当的做调整,相信你一定没有问题的。

chai51 avatar Mar 13 '25 12:03 chai51

由于该模型对英语使用者具有价值,因此用英语分享这些信息将使更多人受益。

Because the model has value to English speakers, it would benefit more people to share this information in English.

RBEmerson970 avatar Mar 13 '25 14:03 RBEmerson970

Wait until it's out of beta. They removed several voices for now until it's production ready if I understand correctly. Is there a particular reason you want to use this version? Does it have any exciting features I don't know about yet??

In the Kokoro-82M-v1.1-zh version, the output Chinese voice can be used normally. In other versions, the Chinese speech tone is pronounced using English intonation. To put it in perspective, you can understand it as a Japanese person speaking English.

zhy844694805 avatar Mar 21 '25 13:03 zhy844694805

将github原工程按照这个步骤来,首先这个可能会导致你原有的环境不可用

1.获取资源

sudo apt install espeak-ng

git lfs install cd api/src/models git clone https://huggingface.co/hexgrad/Kokoro-82M-v1.1-zh mv Kokoro-82M-v1.1-zh v1_1-zh

cp -r v1_1-zh/voices ../voices/v1_1-zh

pip uninstall kokoro pip install kokoro pip install misaki[zh]

2.修改代码

api/src/core/config.py

allow_local_voice_saving: bool = (
    False  # Whether to allow saving combined voices locally
)

改为

allow_local_voice_saving: bool = (
    False  # Whether to allow saving combined voices locally
)
repo_id: str = "hexgrad/Kokoro-82M"

api/src/core/model_config.py

# Model filename
pytorch_kokoro_v1_file: str = Field(
    "v1_0/kokoro-v1_0.pth", description="PyTorch Kokoro V1 model filename"
)

改为

# Model filename
pytorch_kokoro_v1_file: str = Field(
    "v1_1-zh/kokoro-v1_1-zh.pth", description="PyTorch Kokoro V1 model filename"
)

api/src/inference/kokoro_v1.py

        # 第一块
        self._model = KModel(config=config_path, model=model_path).eval()

        # 第二块
        self._pipelines[lang_code] = KPipeline(
            lang_code=lang_code, model=self._model, device=self._device
        )

改为

        # 第一块
        self._model = KModel(config=config_path, model=model_path, repo_id=settings.repo_id).eval()


        # 第二块
        self._pipelines[lang_code] = KPipeline(
            lang_code=lang_code, model=self._model, device=self._device, repo_id=settings.repo_id
        )

api/src/inference/model_manager.py

            warmup_text = "Warmup text for initialization."

改为

            warmup_text = "初始化的预热文本。"

api/src/services/text_processing/phonemizer.py

if language not in lang_map:
    raise ValueError(f"Unsupported language code: {language}")

return EspeakBackend(lang_map[language])

改为

return EspeakBackend("cmn")

3.修改启动脚本

export VOICES_DIR=src/voices/v1_1-zh export DEFAULT_VOICE=zf_094 export REPO_ID=hexgrad/Kokoro-82M-v1.1-zh

4.开始使用

访问 http://localhost:8880/web/ Search voices选择 z 开头的均可 LanguageAuto改为Chinese

奇怪,为什么我按照这个步骤来,中文读出来全部像东北话? That's odd, after I followed these steps, it sounds like Northeast dialect?

SunixLiu avatar Apr 14 '25 10:04 SunixLiu

将github原工程按照这个步骤来,首先这个可能会导致你原有的环境不可用

1.获取资源

sudo apt install espeak-ng

git lfs install cd api/src/models git clone https://huggingface.co/hexgrad/Kokoro-82M-v1.1-zh mv Kokoro-82M-v1.1-zh v1_1-zh

cp -r v1_1-zh/voices ../voices/v1_1-zh

pip uninstall kokoro pip install kokoro pip install misaki[zh]

2.修改代码

api/src/core/config.py

allow_local_voice_saving: bool = (
    False  # Whether to allow saving combined voices locally
)

改为

allow_local_voice_saving: bool = (
    False  # Whether to allow saving combined voices locally
)
repo_id: str = "hexgrad/Kokoro-82M"

api/src/core/model_config.py

# Model filename
pytorch_kokoro_v1_file: str = Field(
    "v1_0/kokoro-v1_0.pth", description="PyTorch Kokoro V1 model filename"
)

改为

# Model filename
pytorch_kokoro_v1_file: str = Field(
    "v1_1-zh/kokoro-v1_1-zh.pth", description="PyTorch Kokoro V1 model filename"
)

api/src/inference/kokoro_v1.py

        # 第一块
        self._model = KModel(config=config_path, model=model_path).eval()

        # 第二块
        self._pipelines[lang_code] = KPipeline(
            lang_code=lang_code, model=self._model, device=self._device
        )

改为

        # 第一块
        self._model = KModel(config=config_path, model=model_path, repo_id=settings.repo_id).eval()


        # 第二块
        self._pipelines[lang_code] = KPipeline(
            lang_code=lang_code, model=self._model, device=self._device, repo_id=settings.repo_id
        )

api/src/inference/model_manager.py

            warmup_text = "Warmup text for initialization."

改为

            warmup_text = "初始化的预热文本。"

api/src/services/text_processing/phonemizer.py

if language not in lang_map:
    raise ValueError(f"Unsupported language code: {language}")

return EspeakBackend(lang_map[language])

改为

return EspeakBackend("cmn")

3.修改启动脚本

export VOICES_DIR=src/voices/v1_1-zh export DEFAULT_VOICE=zf_094 export REPO_ID=hexgrad/Kokoro-82M-v1.1-zh

4.开始使用

访问 http://localhost:8880/web/ Search voices选择 z 开头的均可 LanguageAuto改为Chinese

Image 纯中文可以,中英文混合会报错Generation failed: 'ZHG2P' object has no attribute 'unk'

Jacknolfskin avatar Apr 29 '25 03:04 Jacknolfskin

将github原工程按照这个步骤来,首先这个可能会导致你原有的环境不可用

1.获取资源

sudo apt install espeak-ng git lfs install cd api/src/models git clone https://huggingface.co/hexgrad/Kokoro-82M-v1.1-zh mv Kokoro-82M-v1.1-zh v1_1-zh cp -r v1_1-zh/voices ../voices/v1_1-zh pip uninstall kokoro pip install kokoro pip install misaki[zh]

2.修改代码

api/src/core/config.py

allow_local_voice_saving: bool = (
    False  # Whether to allow saving combined voices locally
)

改为

allow_local_voice_saving: bool = (
    False  # Whether to allow saving combined voices locally
)
repo_id: str = "hexgrad/Kokoro-82M"

api/src/core/model_config.py

# Model filename
pytorch_kokoro_v1_file: str = Field(
    "v1_0/kokoro-v1_0.pth", description="PyTorch Kokoro V1 model filename"
)

改为

# Model filename
pytorch_kokoro_v1_file: str = Field(
    "v1_1-zh/kokoro-v1_1-zh.pth", description="PyTorch Kokoro V1 model filename"
)

api/src/inference/kokoro_v1.py

        # 第一块
        self._model = KModel(config=config_path, model=model_path).eval()

        # 第二块
        self._pipelines[lang_code] = KPipeline(
            lang_code=lang_code, model=self._model, device=self._device
        )

改为

        # 第一块
        self._model = KModel(config=config_path, model=model_path, repo_id=settings.repo_id).eval()


        # 第二块
        self._pipelines[lang_code] = KPipeline(
            lang_code=lang_code, model=self._model, device=self._device, repo_id=settings.repo_id
        )

api/src/inference/model_manager.py

            warmup_text = "Warmup text for initialization."

改为

            warmup_text = "初始化的预热文本。"

api/src/services/text_processing/phonemizer.py

if language not in lang_map:
    raise ValueError(f"Unsupported language code: {language}")

return EspeakBackend(lang_map[language])

改为

return EspeakBackend("cmn")

3.修改启动脚本

export VOICES_DIR=src/voices/v1_1-zh export DEFAULT_VOICE=zf_094 export REPO_ID=hexgrad/Kokoro-82M-v1.1-zh

4.开始使用

访问 http://localhost:8880/web/ Search voices选择 z 开头的均可 LanguageAuto改为Chinese

Image 纯中文可以,中英文混合会报错Generation failed: 'ZHG2P' object has no attribute 'unk'

@Jacknolfskin ZHG2P是misaki的模块,需要更新misaki。更新方式:升级pyproject.toml中如下两个库到版本0.9.4 "kokoro==0.9.4", "misaki[en,ja,ko,zh]==0.9.4", 再重新安装py依赖即可

remxcode avatar May 25 '25 16:05 remxcode

将github原工程按照这个步骤来,首先这个可能会导致你原有的环境不可用

1.获取资源

sudo apt install espeak-ng git lfs install cd api/src/models git clone https://huggingface.co/hexgrad/Kokoro-82M-v1.1-zh mv Kokoro-82M-v1.1-zh v1_1-zh cp -r v1_1-zh/voices ../voices/v1_1-zh pip uninstall kokoro pip install kokoro pip install misaki[zh]

2.修改代码

api/src/core/config.py

allow_local_voice_saving: bool = (
    False  # Whether to allow saving combined voices locally
)

改为

allow_local_voice_saving: bool = (
    False  # Whether to allow saving combined voices locally
)
repo_id: str = "hexgrad/Kokoro-82M"

api/src/core/model_config.py

# Model filename
pytorch_kokoro_v1_file: str = Field(
    "v1_0/kokoro-v1_0.pth", description="PyTorch Kokoro V1 model filename"
)

改为

# Model filename
pytorch_kokoro_v1_file: str = Field(
    "v1_1-zh/kokoro-v1_1-zh.pth", description="PyTorch Kokoro V1 model filename"
)

api/src/inference/kokoro_v1.py

        # 第一块
        self._model = KModel(config=config_path, model=model_path).eval()

        # 第二块
        self._pipelines[lang_code] = KPipeline(
            lang_code=lang_code, model=self._model, device=self._device
        )

改为

        # 第一块
        self._model = KModel(config=config_path, model=model_path, repo_id=settings.repo_id).eval()


        # 第二块
        self._pipelines[lang_code] = KPipeline(
            lang_code=lang_code, model=self._model, device=self._device, repo_id=settings.repo_id
        )

api/src/inference/model_manager.py

            warmup_text = "Warmup text for initialization."

改为

            warmup_text = "初始化的预热文本。"

api/src/services/text_processing/phonemizer.py

if language not in lang_map:
    raise ValueError(f"Unsupported language code: {language}")

return EspeakBackend(lang_map[language])

改为

return EspeakBackend("cmn")

3.修改启动脚本

export VOICES_DIR=src/voices/v1_1-zh export DEFAULT_VOICE=zf_094 export REPO_ID=hexgrad/Kokoro-82M-v1.1-zh

4.开始使用

访问 http://localhost:8880/web/ Search voices选择 z 开头的均可 LanguageAuto改为Chinese

Image 纯中文可以,中英文混合会报错Generation failed: 'ZHG2P' object has no attribute 'unk'

@Jacknolfskin ZHG2P是misaki的模块,需要更新misaki。更新方式:升级pyproject.toml中如下两个库到版本0.9.4 "kokoro==0.9.4", "misaki[en,ja,ko,zh]==0.9.4", 再重新安装py依赖即可

升级后还是英文还是无法正常发声, 有办法能读出来吗? 中英混合场景缺少一段就不连贯了

leiax00 avatar Jul 01 '25 03:07 leiax00

https://github.com/remsky/Kokoro-FastAPI/pull/237

这个解决了我的问题

leiax00 avatar Jul 01 '25 03:07 leiax00

#237

这个解决了我的问题

解决了 英文漏掉的问题吗

cosyman avatar Oct 11 '25 03:10 cosyman