Open-LLM-VTuber icon indicating copy to clipboard operation
Open-LLM-VTuber copied to clipboard

kokoro-onnx integration

Open t41372 opened this issue 11 months ago • 8 comments

https://github.com/thewh1teagle/kokoro-onnx

t41372 avatar Jan 11 '25 06:01 t41372

i am trying implementing it but when the audio is played the sound is completely broken. It's too complex for me to make it working honesly.

WasamiKirua avatar Feb 08 '25 12:02 WasamiKirua

Not working for me. The regular kokoro works just fine`

import os
from loguru import logger
import numpy as np
import soundfile as sf
from .tts_interface import TTSInterface

class TTSEngine(TTSInterface):
    def __init__(self, sample_rate=24000):
    # def __init__(self, sample_rate=22050):
        self.file_extension = "wav"
        self.new_audio_dir = "cache"
        self.sample_rate = sample_rate

        if not os.path.exists(self.new_audio_dir):
            os.makedirs(self.new_audio_dir)

        # Initialize the Kokoro pipeline once to reduce initial delay later.
        try:
            from kokoro import KPipeline
            self.pipeline = KPipeline(lang_code='a')
        except SystemExit as se:
            logger.error("KPipeline initialization failed. Ensure pip is installed and spaCy is set up (e.g., 'pip install spacy' and 'spacy download en_core_web_sm').")
            self.pipeline = None
        except Exception as e:
            logger.error(f"KPipeline initialization error: {e}")
            self.pipeline = None

    def generate_audio(self, text: str, file_name_no_ext=None) -> str:
        """
        Generate speech audio file using a dummy kokoro TTS engine.

        Parameters:
            text (str): Text to synthesize.
            file_name_no_ext (str): File name without extension.

        Returns:
            str: The path to the generated audio file.
        """
        file_name = self.generate_cache_file_name(file_name_no_ext, self.file_extension)
        try:
            # Synthesize audio samples (dummy sine wave signal)
            audio_samples = self.synthesize(text)
            if audio_samples is None or len(audio_samples) == 0:
                logger.error("Kokoro TTS engine failed to generate audio.")
                return None

            sf.write(file_name, audio_samples, samplerate=self.sample_rate, subtype="PCM_16")
            return file_name
        except Exception as e:
            logger.critical(f"Error in generate_audio using Kokoro TTS: {e}")
            return None

    def synthesize(self, text: str):
        """
        Synthesize speech using Kokoro KPipeline.
        This method uses the kokoro library to generate audio segments for the text,
        then concatenates them into a single audio signal.
        """
        try:
            if self.pipeline is None:
                logger.error("Kokoro pipeline is not initialized.")
                return None

            generator = self.pipeline(text, voice='af_heart', speed=1, split_pattern=r'\n+')
            audio_segments = []
            for _, _, audio in generator:
                # Convert audio to numpy if needed
                audio_np = audio if isinstance(audio, np.ndarray) else audio.numpy()
                audio_segments.append(audio_np)
            if audio_segments:
                combined = np.concatenate(audio_segments)
                return combined
            else:
                logger.error("Kokoro pipeline returned no audio segments.")
                return None
        except Exception as e:
            logger.error(f"Kokoro synthesize error: {e}")
            return None

melkeades avatar Feb 23 '25 23:02 melkeades

There seems to be a problem with uv and spaCy, which prevents me from properly setting up kokoro package with uv. Once everything is set up with conda, the code above works.

I think it might be better to use thewh1teagle/kokoro-onnx instead of hexgrad/kokoro. We can check/download the model file just like we did in sherpa_onnx_asr with sense voice small model.

t41372 avatar Feb 24 '25 18:02 t41372

the Chinese pronounciation is a bit weird for kokoro

ml-inory avatar Feb 27 '25 17:02 ml-inory

you need kokoro 1.1 zh

fastfading avatar Apr 04 '25 10:04 fastfading

just use. https://github.com/remsky/Kokoro-FastAPI support cuda and mac mlx gpu question is how to use this api in open llm vtuber may be we need an common api interface

fastfading avatar Apr 04 '25 10:04 fastfading

Using thewh1teagle/kokoro-onnx seems to require numpy2 or higher. When using numpy1 with kokoro-onnx0.3.3, I encountered a character parsing error. However, there were no issues with numpy2.

aki-colt avatar Apr 23 '25 10:04 aki-colt

Thank you for the heads up. I will have to do some testing to ensure we can safely go to numpy 2 without breaking things.

btw, the reason why we even have to test it because the dependencies of many ASR and TTS models in our project require installing additional packages not declared in the project. Because some of them have dependency conflicts with each other, I can't put them as optional dependencies because uv would evaluate all of them and fail.

t41372 avatar Apr 25 '25 20:04 t41372