Open-LLM-VTuber kokoro-onnx integration

https://github.com/thewh1teagle/kokoro-onnx

Jan 11 '25 06:01 t41372

i am trying implementing it but when the audio is played the sound is completely broken. It's too complex for me to make it working honesly.

Feb 08 '25 12:02 WasamiKirua

Not working for me. The regular kokoro works just fine`

import os
from loguru import logger
import numpy as np
import soundfile as sf
from .tts_interface import TTSInterface

class TTSEngine(TTSInterface):
    def __init__(self, sample_rate=24000):
    # def __init__(self, sample_rate=22050):
        self.file_extension = "wav"
        self.new_audio_dir = "cache"
        self.sample_rate = sample_rate

        if not os.path.exists(self.new_audio_dir):
            os.makedirs(self.new_audio_dir)

        # Initialize the Kokoro pipeline once to reduce initial delay later.
        try:
            from kokoro import KPipeline
            self.pipeline = KPipeline(lang_code='a')
        except SystemExit as se:
            logger.error("KPipeline initialization failed. Ensure pip is installed and spaCy is set up (e.g., 'pip install spacy' and 'spacy download en_core_web_sm').")
            self.pipeline = None
        except Exception as e:
            logger.error(f"KPipeline initialization error: {e}")
            self.pipeline = None

    def generate_audio(self, text: str, file_name_no_ext=None) -> str:
        """
        Generate speech audio file using a dummy kokoro TTS engine.

        Parameters:
            text (str): Text to synthesize.
            file_name_no_ext (str): File name without extension.

        Returns:
            str: The path to the generated audio file.
        """
        file_name = self.generate_cache_file_name(file_name_no_ext, self.file_extension)
        try:
            # Synthesize audio samples (dummy sine wave signal)
            audio_samples = self.synthesize(text)
            if audio_samples is None or len(audio_samples) == 0:
                logger.error("Kokoro TTS engine failed to generate audio.")
                return None

            sf.write(file_name, audio_samples, samplerate=self.sample_rate, subtype="PCM_16")
            return file_name
        except Exception as e:
            logger.critical(f"Error in generate_audio using Kokoro TTS: {e}")
            return None

    def synthesize(self, text: str):
        """
        Synthesize speech using Kokoro KPipeline.
        This method uses the kokoro library to generate audio segments for the text,
        then concatenates them into a single audio signal.
        """
        try:
            if self.pipeline is None:
                logger.error("Kokoro pipeline is not initialized.")
                return None

            generator = self.pipeline(text, voice='af_heart', speed=1, split_pattern=r'\n+')
            audio_segments = []
            for _, _, audio in generator:
                # Convert audio to numpy if needed
                audio_np = audio if isinstance(audio, np.ndarray) else audio.numpy()
                audio_segments.append(audio_np)
            if audio_segments:
                combined = np.concatenate(audio_segments)
                return combined
            else:
                logger.error("Kokoro pipeline returned no audio segments.")
                return None
        except Exception as e:
            logger.error(f"Kokoro synthesize error: {e}")
            return None

Feb 23 '25 23:02 melkeades

There seems to be a problem with uv and spaCy, which prevents me from properly setting up kokoro package with uv. Once everything is set up with conda, the code above works.

I think it might be better to use thewh1teagle/kokoro-onnx instead of hexgrad/kokoro. We can check/download the model file just like we did in sherpa_onnx_asr with sense voice small model.

Feb 24 '25 18:02 t41372

the Chinese pronounciation is a bit weird for kokoro

Feb 27 '25 17:02 ml-inory

you need kokoro 1.1 zh

Apr 04 '25 10:04 fastfading

just use. https://github.com/remsky/Kokoro-FastAPI support cuda and mac mlx gpu question is how to use this api in open llm vtuber may be we need an common api interface

Apr 04 '25 10:04 fastfading

Using thewh1teagle/kokoro-onnx seems to require numpy2 or higher. When using numpy1 with kokoro-onnx0.3.3, I encountered a character parsing error. However, there were no issues with numpy2.

Apr 23 '25 10:04 aki-colt

Thank you for the heads up. I will have to do some testing to ensure we can safely go to numpy 2 without breaking things.

btw, the reason why we even have to test it because the dependencies of many ASR and TTS models in our project require installing additional packages not declared in the project. Because some of them have dependency conflicts with each other, I can't put them as optional dependencies because uv would evaluate all of them and fail.

Apr 25 '25 20:04 t41372