Kokoro-FastAPI icon indicating copy to clipboard operation
Kokoro-FastAPI copied to clipboard

[pause] handling

Open flinthamm opened this issue 6 months ago • 7 comments

The new pause handling is great.

However, there seems to be a bug if a pause is placed as the first item in a sentence. If you put a single character, even a [ or ' first or place [pause:1s] in any other position, everything seems fine. The /v1/audio/speech endpoint returns a 500 Internal Server Error and:

{
    "detail": {
        "error": "processing_error",
        "message": "'NoneType' object is not iterable",
        "type": "server_error"
    }
}

The logs return:

kokoro-tts-1  | 09:14:08 AM | INFO     | text_processor:332 | Split completed in 829.77ms, produced 5 chunks (including pauses)
kokoro-tts-1  | 09:14:08 AM | ERROR    | tts_service:432 | Error in audio generation: 'NoneType' object is not iterable
kokoro-tts-1  | 09:14:08 AM | ERROR    | openai_compatible:397 | Unexpected error in speech generation: 'NoneType' object is not iterable
kokoro-tts-1  | INFO:     192.X.X.X:XXXX - "POST /v1/audio/speech HTTP/1.1" 500 Internal Server Error

This has only been tested on Linux with Docker CPU release v0.2.4.

flinthamm avatar Jun 30 '25 09:06 flinthamm

@flinthamm can you please give example texts for both scenarios please

fireblade2534 avatar Jul 04 '25 22:07 fireblade2534

Can someone make a version where the voice speaks normally with pauses? It's too fast for an audiobook, for example.

(For example, TTS Open AI has the same voices but they speak normally, slowly)

martinezvl avatar Aug 16 '25 11:08 martinezvl

Can someone make a version where the voice speaks normally with pauses? It's too fast for an audiobook, for example.

(For example, TTS Open AI has the same voices but they speak normally, slowly)

nicole voice speaks at an audiobook rate. slow & soft

yemo90 avatar Aug 16 '25 16:08 yemo90

Can someone make a version where the voice speaks normally with pauses? It's too fast for an audiobook, for example. (For example, TTS Open AI has the same voices but they speak normally, slowly)

nicole voice speaks at an audiobook rate. slow & soft

No, it's not suitable for audiobooks at all. Just compare, for example, onyx Kokoro and onyx with Open AI TTS. You will understand the difference. There are more pronounced pauses.

Ps. Although, maybe the difference isn't that big.

martinezvl avatar Aug 16 '25 16:08 martinezvl

@martinezvl

Can someone make a version where the voice speaks normally with pauses? It's too fast for an audiobook, for example. (For example, TTS Open AI has the same voices but they speak normally, slowly)

Please clarify - Are you looking for slower speech or are you looking for, as an example, pauses between sentences or paragraphs?

I routinely use Fast Koko (i.e., the web tool) to read back what I've written for my books. Slowing read-back to .9x (occasionally .8x for one or two female voices) works well.

However, the output could be better in several ways (e.g., handling phrases and sentences ending with a question mark, or saying "read back what you just read" correctly), but it certainly beats the bad old days of Mac and Windows robo-voices.

RBEmerson970 avatar Aug 16 '25 18:08 RBEmerson970

looking for, as an example, pauses between sentences or paragraphs?

I am looking for "pauses between sentences or paragraphs". Of course, Kokoro is already an amazing tool. A lot of cool work has already been done, thanks to you guys!

martinezvl avatar Aug 16 '25 18:08 martinezvl

@martinezvl If you're using Fast Koko, it's pretty much "what you see is what you get". It hasn't changed appreciably in several weeks, despite there being significant problems as it is. For example, reading back directly (not to a file) comes apart at about 10:02 into the text. Any text running close to, or beyond, ten minutes can only be used as a saved MP3, WAV, or PCM file played back through the player of your choice. Fast Koko fumbles the file name (click the "download" button to see it happen), but at least the content is usable.

Frankly, I've concluded Kokoro itself has pretty much ground to a halt. IMNSHO there's a lot of clickbait about swell expression-capable tools, etc., but something for local work that will read back an appreciable amount (one or more chapters, if not a full story)? I'm not seeing it. Pity...

RBEmerson970 avatar Aug 16 '25 19:08 RBEmerson970