llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Feature Request: Add support for Kokoro TTS

Open broke-end-dev opened this issue 11 months ago • 28 comments

Prerequisites

  • [X] I am running the latest code. Mention the version if possible as well.
  • [X] I carefully followed the README.md.
  • [X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [X] I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Devs, can you add support for Kokoro TTS? It's awesome in terms of accents and natural tone, considering it's size. It is currently one of the most popular models in Pandroker's TTS arena space on hugginface. Thanks! https://huggingface.co/hexgrad/Kokoro-82M

Motivation

Many, including me want to deploy it on cpu/edge devices

Possible Implementation

No response

broke-end-dev avatar Jan 03 '25 05:01 broke-end-dev

+1

darkzbaron avatar Jan 04 '25 09:01 darkzbaron

+1. The claim is that it's faster than realtime on the Mac.

scalar27 avatar Jan 05 '25 20:01 scalar27

+1

logikstate avatar Jan 09 '25 20:01 logikstate

+1

ggerganov avatar Jan 09 '25 20:01 ggerganov

+1

stopthinking102 avatar Jan 10 '25 21:01 stopthinking102

+1

OXKSA1 avatar Jan 13 '25 19:01 OXKSA1

+1

frankai avatar Jan 13 '25 20:01 frankai

+1 🎯

verioussmith avatar Jan 13 '25 21:01 verioussmith

+1

therealtimex avatar Jan 13 '25 22:01 therealtimex

+1

razorback16 avatar Jan 14 '25 09:01 razorback16

+1

KonstantinSelyuk avatar Jan 14 '25 09:01 KonstantinSelyuk

+1

apepkuss avatar Jan 14 '25 14:01 apepkuss

+2

logikstate avatar Jan 15 '25 15:01 logikstate

+1

signalstop avatar Jan 18 '25 04:01 signalstop

+1

yoshuzx avatar Jan 18 '25 15:01 yoshuzx

+1

henk717 avatar Jan 18 '25 16:01 henk717

+1 Would be cool to see more tts options in llama.cpp

YorkieDev avatar Jan 19 '25 09:01 YorkieDev

These can be reproduced at https://hf.co/spaces/hexgrad/Kokoro-TTS without installing anything.

I'm sorry Dave, I'm afraid I can't do that. https://github.com/ggerganov/llama.cpp/pull/10784#issue-2733486635 ˌIm sˈɔɹi dˈAv, ˌIm əfɹˈAd ˌI kˈænt dˈu ðˈæt.

https://github.com/user-attachments/assets/d80f9d68-d7d4-4b84-bd7b-26c6ae87ad38

TTS requires 2 models to be provided: an LLM and a Vocoder. The first one generates audio codes (tokens) from the provided input text, based on some voice settings. The second one converts the audio codes into a spectrogram. The spectrogram is then converted back to audio with inverse FFT. https://github.com/ggerganov/llama.cpp/pull/10784#issuecomment-2536969458 tˌitˌiˈɛs ɹəkwˈIəɹz tˈu mˈɑdᵊlz tə bi pɹəvˈIdᵻd: ɐn ˌɛlˌɛlˈɛm ænd ɐ vˈOkˌOdəɹ. ðə fˈɜɹst wˈʌn ʤˈɛnəɹˌAts ˈɔdiO kˈOdz (tˈOkᵊnz) fɹʌm ðə pɹəvˈIdᵻd ˈɪnpˌʊt tˈɛkst, bˈAst ˌɔn sˌʌm vˈYs sˈɛTɪŋz. ðə sˈɛkənd wˈʌn kənvˈɜɹts ði ˈɔdiO kˈOdz ˈɪntu ɐ spˈɛktɹəɡɹˌæm. ðə spˈɛktɹəɡɹˌæm ɪz ðˈɛn kənvˈɜɹTᵻd bˈæk tʊ ˈɔdiO wɪð ˈɪnvˌɜɹs ˌɛfˌɛftˈi.

https://github.com/user-attachments/assets/7189a07a-2144-4815-a41c-aa0679bdefff

Not sure how to pass punctuation yet. Or even if this model supports it. https://github.com/ggerganov/llama.cpp/pull/10784#issuecomment-2536969458 nˌɑt ʃˈʊɹ hˌW tə pˈæs pˌʌŋkʧəwˈAʃən jˈɛt. ˌɔɹ ˈivən ɪf ðɪs mˈɑdᵊl səpˈɔɹts ɪt.

https://github.com/user-attachments/assets/4f3de736-7af5-4b07-bd3e-852478cc847e

hexgrad avatar Feb 01 '25 02:02 hexgrad

@hexgrad are those reprods with a C++ implementation?

namhkoh avatar Feb 10 '25 21:02 namhkoh

@namhkoh No, it's Python & PyTorch, as I mentioned https://github.com/ggerganov/llama.cpp/issues/11050#issuecomment-2628700821

These can be reproduced at https://hf.co/spaces/hexgrad/Kokoro-TTS without installing anything.

hexgrad avatar Feb 10 '25 22:02 hexgrad

There is an onnx/c# implimentation of Kokoro here https://github.com/Lyrcaxis/KokoroSharp

But I think? (not sure) its using espeak as the phonemiser? which is different? to how the Python & Pytorch version works? That use G2P?

Am I correct here? @hexgrad ?

logikstate avatar Feb 14 '25 15:02 logikstate

I am currently seeking a c++ implementation.

namhkoh avatar Feb 14 '25 16:02 namhkoh

You need G2P to make the whole thing work, but llama.cpp can probably disregard that piece for now—the c++ scope for llama.cpp would likely just be porting the modeling code in these 3 files:

  1. https://github.com/hexgrad/kokoro/blob/1145c0b7f6f3c781d35b1b67a283a32580bc5acd/kokoro/model.py
  2. https://github.com/hexgrad/kokoro/blob/1145c0b7f6f3c781d35b1b67a283a32580bc5acd/kokoro/modules.py
  3. https://github.com/hexgrad/kokoro/blob/1145c0b7f6f3c781d35b1b67a283a32580bc5acd/kokoro/istftnet.py

hexgrad avatar Feb 14 '25 18:02 hexgrad

I am currently seeking a c++ implementation.

@namhkoh

We supported kokoro in sherpa-onnx a long time ago.

It provides not only C++ APIs for Kokoro v0.19 and Kokoro 1.0, but it also supports 11 other programming languages, e.g., C, Java, Kotlin, Swift, Dart, C#, Go, JavaScript, Object Pascal, Python.

You can find the usage doc at https://k2-fsa.github.io/sherpa/onnx/tts/pretrained_models/kokoro.html

csukuangfj avatar Feb 17 '25 08:02 csukuangfj

is there any update on this?

Yossef-Dawoad avatar Mar 17 '25 17:03 Yossef-Dawoad

+1

Doraemonwei avatar Mar 25 '25 14:03 Doraemonwei

+1

gavine99 avatar Apr 25 '25 08:04 gavine99

+1

06opoTeHb avatar Apr 26 '25 13:04 06opoTeHb

+1

Disonantemus avatar May 22 '25 14:05 Disonantemus

+1

babybirdprd avatar May 23 '25 16:05 babybirdprd