Kokoro-FastAPI Is it possible to read a word with rising intonation? for example, to read 'apple' out as 'apple?'

Is it possible to read a word with rising intonation? for example, to read 'apple' out as 'apple?'

Apr 08 '25 13:04 bk111

if putting in "apple?" doesn't do it then likely no. You could try using custom phenomes or stress/intonation as described at the bottom of this: https://huggingface.co/spaces/hexgrad/Kokoro-TTS

Apr 09 '25 14:04 fireblade2534

if putting in "apple?" doesn't do it then likely no. You could try using custom phenomes or stress/intonation as described at the bottom of this: https://huggingface.co/spaces/hexgrad/Kokoro-TTS

sorry, could you mind give me more guide? for example, 1 syllable word 'take', 2 syllables word 'donkey', 3 syllables word 'example', more syllables word. How do I get the rising intonation mp3 as if a native speaker asks Yes\No questions with the key word in the end?

Apr 10 '25 08:04 bk111

@bk111 Copied from the description of the space:

💡 Customize pronunciation with Markdown link syntax and /slashes/ like [Kokoro](/kˈOkəɹO/)

💬 To adjust intonation, try punctuation ;:,.!?—…"()“” or stress ˈ and ˌ

⬇️ Lower stress [1 level](-1) or [2 levels](-2)

⬆️ Raise stress 1 level [or](+2) 2 levels (only works on less stressed, usually short words)

Apr 10 '25 13:04 fireblade2534

I was kind of wondering the same but I cannot really get it to work any differently when I include the punctuation. Is there any part in the documentation where this is specified?

Apr 13 '25 23:04 silgon

I was kind of wondering the same but I cannot really get it to work any differently when I include the punctuation. Is there any part in the documentation where this is specified?

Do you need the rising intonation word? for what? Maybe make some sentences like : Can I call you take? to get the rising 'take'?

Apr 14 '25 12:04 bk111

Well, in my case I was just playing with it, it's not that I need it. The thing that I would like tough, is to control more the space between the sentences, I would like to make it a bit longer. I was trying with the options cited in the huggingface space referenced by @fireblade2534 , however with no success.

Apr 15 '25 04:04 silgon

@silgon I managed to install kokoro tts on my vps using cpu. It's really fast but intonation won't work except on the huggingface space. Question marks or something like that won't work. But on the hf space works fine. Is anything I'm doing wrong? I've sucessfully integrated kokoro-fastapi into my n8n workflow.

Apr 21 '25 17:04 gab-luz

@bk111 Copied from the description of the space:

💡 Customize pronunciation with Markdown link syntax and /slashes/ like [Kokoro](/kˈOkəɹO/)

💬 To adjust intonation, try punctuation ;:,.!?—…"()“” or stress ˈ and ˌ

⬇️ Lower stress [1 level](-1) or [2 levels](-2)

⬆️ Raise stress 1 level [or](+2) 2 levels (only works on less stressed, usually short words)

I am not understanding what means, if i put (-1) in the text it will just read minus one

Apr 24 '25 19:04 MarcoLavoro

huggingface

on huggingface, https://huggingface.co/spaces/hexgrad/Kokoro-TTS, the input is 'Is it an apple?' , but the audio has no rising intonation. Do you find another TTS solution with normal intonation?

May 05 '25 02:05 bk111

@silgon I managed to install kokoro tts on my vps using cpu. It's really fast but intonation won't work except on the huggingface space. Question marks or something like that won't work. But on the hf space works fine. Is anything I'm doing wrong? I've sucessfully integrated kokoro-fastapi into my n8n workflow.

did you solved on local?

May 25 '25 16:05 MarcoLavoro

In general, I don't find Kokoro completely expressive. There is some expression, but it's nopt very close to actual speech. Used as a verbal "proofreader" on my writing, it works to spot mechanical errors, awkward wording, etc. It doesn't do well with questions, EM dashes, question marks, exclamation marks.

OTOH, the TTS options which handle expression better either will emit only far too short examples, or cost more than they're worth. I had hopes Kokoro itself might evolve, but I think it's gone as far as it's going to go. So I'll mark time until the next really useful alternative shows up, using this repo in the meantime.

May 25 '25 16:05 RBEmerson970

Have you tried editing the /app/api/src/services/text_processing/normalizer.py file?

I've managed to do a lot and fill in a lot of the blanks using phenomes in the normalise file by catching regular expressions and converting them into their phenome counterparts.

It might be worth a try.

You could capture the last word and the question marks using regex and auto-add the intonation to match..

May 28 '25 04:05 digitalassassins

We need "?" intonation :)

Jul 21 '25 16:07 martinezvl