kokoro-onnx integration
https://github.com/thewh1teagle/kokoro-onnx
i am trying implementing it but when the audio is played the sound is completely broken. It's too complex for me to make it working honesly.
Not working for me. The regular kokoro works just fine`
import os
from loguru import logger
import numpy as np
import soundfile as sf
from .tts_interface import TTSInterface
class TTSEngine(TTSInterface):
def __init__(self, sample_rate=24000):
# def __init__(self, sample_rate=22050):
self.file_extension = "wav"
self.new_audio_dir = "cache"
self.sample_rate = sample_rate
if not os.path.exists(self.new_audio_dir):
os.makedirs(self.new_audio_dir)
# Initialize the Kokoro pipeline once to reduce initial delay later.
try:
from kokoro import KPipeline
self.pipeline = KPipeline(lang_code='a')
except SystemExit as se:
logger.error("KPipeline initialization failed. Ensure pip is installed and spaCy is set up (e.g., 'pip install spacy' and 'spacy download en_core_web_sm').")
self.pipeline = None
except Exception as e:
logger.error(f"KPipeline initialization error: {e}")
self.pipeline = None
def generate_audio(self, text: str, file_name_no_ext=None) -> str:
"""
Generate speech audio file using a dummy kokoro TTS engine.
Parameters:
text (str): Text to synthesize.
file_name_no_ext (str): File name without extension.
Returns:
str: The path to the generated audio file.
"""
file_name = self.generate_cache_file_name(file_name_no_ext, self.file_extension)
try:
# Synthesize audio samples (dummy sine wave signal)
audio_samples = self.synthesize(text)
if audio_samples is None or len(audio_samples) == 0:
logger.error("Kokoro TTS engine failed to generate audio.")
return None
sf.write(file_name, audio_samples, samplerate=self.sample_rate, subtype="PCM_16")
return file_name
except Exception as e:
logger.critical(f"Error in generate_audio using Kokoro TTS: {e}")
return None
def synthesize(self, text: str):
"""
Synthesize speech using Kokoro KPipeline.
This method uses the kokoro library to generate audio segments for the text,
then concatenates them into a single audio signal.
"""
try:
if self.pipeline is None:
logger.error("Kokoro pipeline is not initialized.")
return None
generator = self.pipeline(text, voice='af_heart', speed=1, split_pattern=r'\n+')
audio_segments = []
for _, _, audio in generator:
# Convert audio to numpy if needed
audio_np = audio if isinstance(audio, np.ndarray) else audio.numpy()
audio_segments.append(audio_np)
if audio_segments:
combined = np.concatenate(audio_segments)
return combined
else:
logger.error("Kokoro pipeline returned no audio segments.")
return None
except Exception as e:
logger.error(f"Kokoro synthesize error: {e}")
return None
There seems to be a problem with uv and spaCy, which prevents me from properly setting up kokoro package with uv. Once everything is set up with conda, the code above works.
I think it might be better to use thewh1teagle/kokoro-onnx instead of hexgrad/kokoro. We can check/download the model file just like we did in sherpa_onnx_asr with sense voice small model.
the Chinese pronounciation is a bit weird for kokoro
you need kokoro 1.1 zh
just use. https://github.com/remsky/Kokoro-FastAPI support cuda and mac mlx gpu question is how to use this api in open llm vtuber may be we need an common api interface
Using thewh1teagle/kokoro-onnx seems to require numpy2 or higher. When using numpy1 with kokoro-onnx0.3.3, I encountered a character parsing error. However, there were no issues with numpy2.
Thank you for the heads up. I will have to do some testing to ensure we can safely go to numpy 2 without breaking things.
btw, the reason why we even have to test it because the dependencies of many ASR and TTS models in our project require installing additional packages not declared in the project. Because some of them have dependency conflicts with each other, I can't put them as optional dependencies because uv would evaluate all of them and fail.