Kokoro-FastAPI icon indicating copy to clipboard operation
Kokoro-FastAPI copied to clipboard

Is it possible to read a word with rising intonation? for example, to read 'apple' out as 'apple?'

Open bk111 opened this issue 9 months ago • 13 comments

Is it possible to read a word with rising intonation? for example, to read 'apple' out as 'apple?'

bk111 avatar Apr 08 '25 13:04 bk111

if putting in "apple?" doesn't do it then likely no. You could try using custom phenomes or stress/intonation as described at the bottom of this: https://huggingface.co/spaces/hexgrad/Kokoro-TTS

fireblade2534 avatar Apr 09 '25 14:04 fireblade2534

if putting in "apple?" doesn't do it then likely no. You could try using custom phenomes or stress/intonation as described at the bottom of this: https://huggingface.co/spaces/hexgrad/Kokoro-TTS

sorry, could you mind give me more guide? for example, 1 syllable word 'take', 2 syllables word 'donkey', 3 syllables word 'example', more syllables word. How do I get the rising intonation mp3 as if a native speaker asks Yes\No questions with the key word in the end?

bk111 avatar Apr 10 '25 08:04 bk111

@bk111 Copied from the description of the space:

💡 Customize pronunciation with Markdown link syntax and /slashes/ like [Kokoro](/kˈOkəɹO/)

💬 To adjust intonation, try punctuation ;:,.!?—…"()“” or stress ˈ and ˌ

⬇️ Lower stress [1 level](-1) or [2 levels](-2)

⬆️ Raise stress 1 level [or](+2) 2 levels (only works on less stressed, usually short words)

fireblade2534 avatar Apr 10 '25 13:04 fireblade2534

I was kind of wondering the same but I cannot really get it to work any differently when I include the punctuation. Is there any part in the documentation where this is specified?

silgon avatar Apr 13 '25 23:04 silgon

I was kind of wondering the same but I cannot really get it to work any differently when I include the punctuation. Is there any part in the documentation where this is specified?

Do you need the rising intonation word? for what? Maybe make some sentences like : Can I call you take? to get the rising 'take'?

bk111 avatar Apr 14 '25 12:04 bk111

Well, in my case I was just playing with it, it's not that I need it. The thing that I would like tough, is to control more the space between the sentences, I would like to make it a bit longer. I was trying with the options cited in the huggingface space referenced by @fireblade2534 , however with no success.

silgon avatar Apr 15 '25 04:04 silgon

@silgon I managed to install kokoro tts on my vps using cpu. It's really fast but intonation won't work except on the huggingface space. Question marks or something like that won't work. But on the hf space works fine. Is anything I'm doing wrong? I've sucessfully integrated kokoro-fastapi into my n8n workflow.

gab-luz avatar Apr 21 '25 17:04 gab-luz

@bk111 Copied from the description of the space:

💡 Customize pronunciation with Markdown link syntax and /slashes/ like [Kokoro](/kˈOkəɹO/)

💬 To adjust intonation, try punctuation ;:,.!?—…"()“” or stress ˈ and ˌ

⬇️ Lower stress [1 level](-1) or [2 levels](-2)

⬆️ Raise stress 1 level [or](+2) 2 levels (only works on less stressed, usually short words)

I am not understanding what means, if i put (-1) in the text it will just read minus one

MarcoLavoro avatar Apr 24 '25 19:04 MarcoLavoro

huggingface

on huggingface, https://huggingface.co/spaces/hexgrad/Kokoro-TTS, the input is 'Is it an apple?' , but the audio has no rising intonation. Do you find another TTS solution with normal intonation?

bk111 avatar May 05 '25 02:05 bk111

@silgon I managed to install kokoro tts on my vps using cpu. It's really fast but intonation won't work except on the huggingface space. Question marks or something like that won't work. But on the hf space works fine. Is anything I'm doing wrong? I've sucessfully integrated kokoro-fastapi into my n8n workflow.

did you solved on local?

MarcoLavoro avatar May 25 '25 16:05 MarcoLavoro

In general, I don't find Kokoro completely expressive. There is some expression, but it's nopt very close to actual speech. Used as a verbal "proofreader" on my writing, it works to spot mechanical errors, awkward wording, etc. It doesn't do well with questions, EM dashes, question marks, exclamation marks.

OTOH, the TTS options which handle expression better either will emit only far too short examples, or cost more than they're worth. I had hopes Kokoro itself might evolve, but I think it's gone as far as it's going to go. So I'll mark time until the next really useful alternative shows up, using this repo in the meantime.

RBEmerson970 avatar May 25 '25 16:05 RBEmerson970

Have you tried editing the /app/api/src/services/text_processing/normalizer.py file?

I've managed to do a lot and fill in a lot of the blanks using phenomes in the normalise file by catching regular expressions and converting them into their phenome counterparts.

It might be worth a try.

You could capture the last word and the question marks using regex and auto-add the intonation to match..

digitalassassins avatar May 28 '25 04:05 digitalassassins

We need "?" intonation :)

martinezvl avatar Jul 21 '25 16:07 martinezvl