Is it possible to read a word with rising intonation? for example, to read 'apple' out as 'apple?'
Is it possible to read a word with rising intonation? for example, to read 'apple' out as 'apple?'
if putting in "apple?" doesn't do it then likely no. You could try using custom phenomes or stress/intonation as described at the bottom of this: https://huggingface.co/spaces/hexgrad/Kokoro-TTS
if putting in "apple?" doesn't do it then likely no. You could try using custom phenomes or stress/intonation as described at the bottom of this: https://huggingface.co/spaces/hexgrad/Kokoro-TTS
sorry, could you mind give me more guide? for example, 1 syllable word 'take', 2 syllables word 'donkey', 3 syllables word 'example', more syllables word. How do I get the rising intonation mp3 as if a native speaker asks Yes\No questions with the key word in the end?
@bk111 Copied from the description of the space:
💡 Customize pronunciation with Markdown link syntax and /slashes/ like [Kokoro](/kˈOkəɹO/)
💬 To adjust intonation, try punctuation ;:,.!?—…"()“” or stress ˈ and ˌ
⬇️ Lower stress [1 level](-1) or [2 levels](-2)
⬆️ Raise stress 1 level [or](+2) 2 levels (only works on less stressed, usually short words)
I was kind of wondering the same but I cannot really get it to work any differently when I include the punctuation. Is there any part in the documentation where this is specified?
I was kind of wondering the same but I cannot really get it to work any differently when I include the punctuation. Is there any part in the documentation where this is specified?
Do you need the rising intonation word? for what? Maybe make some sentences like : Can I call you take? to get the rising 'take'?
Well, in my case I was just playing with it, it's not that I need it. The thing that I would like tough, is to control more the space between the sentences, I would like to make it a bit longer. I was trying with the options cited in the huggingface space referenced by @fireblade2534 , however with no success.
@silgon I managed to install kokoro tts on my vps using cpu. It's really fast but intonation won't work except on the huggingface space. Question marks or something like that won't work. But on the hf space works fine. Is anything I'm doing wrong? I've sucessfully integrated kokoro-fastapi into my n8n workflow.
@bk111 Copied from the description of the space:
💡 Customize pronunciation with Markdown link syntax and /slashes/ like [Kokoro](/kˈOkəɹO/) 💬 To adjust intonation, try punctuation ;:,.!?—…"()“” or stress ˈ and ˌ ⬇️ Lower stress [1 level](-1) or [2 levels](-2) ⬆️ Raise stress 1 level [or](+2) 2 levels (only works on less stressed, usually short words)
I am not understanding what means, if i put (-1) in the text it will just read minus one
huggingface
on huggingface, https://huggingface.co/spaces/hexgrad/Kokoro-TTS, the input is 'Is it an apple?' , but the audio has no rising intonation. Do you find another TTS solution with normal intonation?
@silgon I managed to install kokoro tts on my vps using cpu. It's really fast but intonation won't work except on the huggingface space. Question marks or something like that won't work. But on the hf space works fine. Is anything I'm doing wrong? I've sucessfully integrated kokoro-fastapi into my n8n workflow.
did you solved on local?
In general, I don't find Kokoro completely expressive. There is some expression, but it's nopt very close to actual speech. Used as a verbal "proofreader" on my writing, it works to spot mechanical errors, awkward wording, etc. It doesn't do well with questions, EM dashes, question marks, exclamation marks.
OTOH, the TTS options which handle expression better either will emit only far too short examples, or cost more than they're worth. I had hopes Kokoro itself might evolve, but I think it's gone as far as it's going to go. So I'll mark time until the next really useful alternative shows up, using this repo in the meantime.
Have you tried editing the /app/api/src/services/text_processing/normalizer.py file?
I've managed to do a lot and fill in a lot of the blanks using phenomes in the normalise file by catching regular expressions and converting them into their phenome counterparts.
It might be worth a try.
You could capture the last word and the question marks using regex and auto-add the intonation to match..
We need "?" intonation :)