llama.cpp Feature Request: Add support for Kokoro TTS

Prerequisites

[X] I am running the latest code. Mention the version if possible as well.
[X] I carefully followed the README.md.
[X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[X] I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Devs, can you add support for Kokoro TTS? It's awesome in terms of accents and natural tone, considering it's size. It is currently one of the most popular models in Pandroker's TTS arena space on hugginface. Thanks! https://huggingface.co/hexgrad/Kokoro-82M

Motivation

Many, including me want to deploy it on cpu/edge devices

Possible Implementation

No response

Jan 03 '25 05:01 broke-end-dev

+1

Jan 04 '25 09:01 darkzbaron

+1. The claim is that it's faster than realtime on the Mac.

Jan 05 '25 20:01 scalar27

+1

Jan 09 '25 20:01 logikstate

+1

Jan 09 '25 20:01 ggerganov

+1

Jan 10 '25 21:01 stopthinking102

+1

Jan 13 '25 19:01 OXKSA1

+1

Jan 13 '25 20:01 frankai

+1 🎯

Jan 13 '25 21:01 verioussmith

+1

Jan 13 '25 22:01 therealtimex

+1

Jan 14 '25 09:01 razorback16

+1

Jan 14 '25 09:01 KonstantinSelyuk

+1

Jan 14 '25 14:01 apepkuss

+2

Jan 15 '25 15:01 logikstate

+1

Jan 18 '25 04:01 signalstop

+1

Jan 18 '25 15:01 yoshuzx

+1

Jan 18 '25 16:01 henk717

+1 Would be cool to see more tts options in llama.cpp

Jan 19 '25 09:01 YorkieDev

These can be reproduced at https://hf.co/spaces/hexgrad/Kokoro-TTS without installing anything.

I'm sorry Dave, I'm afraid I can't do that. https://github.com/ggerganov/llama.cpp/pull/10784#issue-2733486635 ˌIm sˈɔɹi dˈAv, ˌIm əfɹˈAd ˌI kˈænt dˈu ðˈæt.

https://github.com/user-attachments/assets/d80f9d68-d7d4-4b84-bd7b-26c6ae87ad38

TTS requires 2 models to be provided: an LLM and a Vocoder. The first one generates audio codes (tokens) from the provided input text, based on some voice settings. The second one converts the audio codes into a spectrogram. The spectrogram is then converted back to audio with inverse FFT. https://github.com/ggerganov/llama.cpp/pull/10784#issuecomment-2536969458 tˌitˌiˈɛs ɹəkwˈIəɹz tˈu mˈɑdᵊlz tə bi pɹəvˈIdᵻd: ɐn ˌɛlˌɛlˈɛm ænd ɐ vˈOkˌOdəɹ. ðə fˈɜɹst wˈʌn ʤˈɛnəɹˌAts ˈɔdiO kˈOdz (tˈOkᵊnz) fɹʌm ðə pɹəvˈIdᵻd ˈɪnpˌʊt tˈɛkst, bˈAst ˌɔn sˌʌm vˈYs sˈɛTɪŋz. ðə sˈɛkənd wˈʌn kənvˈɜɹts ði ˈɔdiO kˈOdz ˈɪntu ɐ spˈɛktɹəɡɹˌæm. ðə spˈɛktɹəɡɹˌæm ɪz ðˈɛn kənvˈɜɹTᵻd bˈæk tʊ ˈɔdiO wɪð ˈɪnvˌɜɹs ˌɛfˌɛftˈi.

https://github.com/user-attachments/assets/7189a07a-2144-4815-a41c-aa0679bdefff

Not sure how to pass punctuation yet. Or even if this model supports it. https://github.com/ggerganov/llama.cpp/pull/10784#issuecomment-2536969458 nˌɑt ʃˈʊɹ hˌW tə pˈæs pˌʌŋkʧəwˈAʃən jˈɛt. ˌɔɹ ˈivən ɪf ðɪs mˈɑdᵊl səpˈɔɹts ɪt.

https://github.com/user-attachments/assets/4f3de736-7af5-4b07-bd3e-852478cc847e

Feb 01 '25 02:02 hexgrad

@hexgrad are those reprods with a C++ implementation?

Feb 10 '25 21:02 namhkoh

@namhkoh No, it's Python & PyTorch, as I mentioned https://github.com/ggerganov/llama.cpp/issues/11050#issuecomment-2628700821

These can be reproduced at https://hf.co/spaces/hexgrad/Kokoro-TTS without installing anything.

Feb 10 '25 22:02 hexgrad

There is an onnx/c# implimentation of Kokoro here https://github.com/Lyrcaxis/KokoroSharp

But I think? (not sure) its using espeak as the phonemiser? which is different? to how the Python & Pytorch version works? That use G2P?

Am I correct here? @hexgrad ?

Feb 14 '25 15:02 logikstate

I am currently seeking a c++ implementation.

Feb 14 '25 16:02 namhkoh

You need G2P to make the whole thing work, but llama.cpp can probably disregard that piece for now—the c++ scope for llama.cpp would likely just be porting the modeling code in these 3 files:

https://github.com/hexgrad/kokoro/blob/1145c0b7f6f3c781d35b1b67a283a32580bc5acd/kokoro/model.py
https://github.com/hexgrad/kokoro/blob/1145c0b7f6f3c781d35b1b67a283a32580bc5acd/kokoro/modules.py
https://github.com/hexgrad/kokoro/blob/1145c0b7f6f3c781d35b1b67a283a32580bc5acd/kokoro/istftnet.py

Feb 14 '25 18:02 hexgrad

I am currently seeking a c++ implementation.

@namhkoh

We supported kokoro in sherpa-onnx a long time ago.

It provides not only C++ APIs for Kokoro v0.19 and Kokoro 1.0, but it also supports 11 other programming languages, e.g., C, Java, Kotlin, Swift, Dart, C#, Go, JavaScript, Object Pascal, Python.

You can find the usage doc at https://k2-fsa.github.io/sherpa/onnx/tts/pretrained_models/kokoro.html