Kokoro docker container crashes with exited code 139 (when using the open-webui "call" feature)

Open deevilboy opened this issue 9 months ago • 6 comments

Describe the bug

The Kokoro container crashes with docker container exit code 139.

I'm using the open-webui "voice chat" feature (i.e. STT voice-to- TTS voice). The crash only occurs when using this "voice chat" feature.

If I manually click a button in open-webui to have kokoro read the LLM's text response, then there is no crash.

Note: When TTS LLM response is very short (i.e. 2-3 sentences), the container never crashes. However, when the TTS response is a long reply (~10+ sentences) the kokoro container crashes with docker container exit code 139.

Note2: The Kokoro web GUI (i.e. localhost:8880/web) works 100% -- it never crashes no matter how long the TTS response is

Operating System

System:
- Macbook M3 Max (128G); arm64
Docker:
- running on Colima with an allocation of 8GB shared memory and 8 CPU cores (out of 12)
Image:
- kokoro-fastapi-cpu:latest (arm64 version)
Setup
- LLM front-end: using open-webui (running on docker in same docker network as kokoro)
- LLM backend: using ollama (running natively on macbook -- i.e. not using docker)

Additional context I have noticed this this error only occurs when using open-webui's call (voice-to-voice) feature.

On open-webui, If I skip using the 'callfeature and just manually press theRead aloud` button (which then calls kokoro for TTS), then the kokoro container never crashes no matter how long the text is.

I've posted the issue here because open-webui docker container never crashes -- only the kokoro-fastapi container crashes.

[my kokoro docker log file is below]

Apr 13 '25 10:04 deevilboy

db@MPB16-M3 open-webui % docker logs kokoro | __main__:download_model:60 - Model files already exist and are valid Waiting for application startup. | main:57 | Loading TTS model and voice packs... | model_manager:38 | Initializing Kokoro V1 on cpu | paths:101 | Searching for model in path: /app/api/src/models | kokoro_v1:45 | Loading Kokoro model on cpu | kokoro_v1:46 | Config path: /app/api/src/models/v1_0/config.json | kokoro_v1:47 | Model path: /app/api/src/models/v1_0/kokoro-v1_0.pth on3.10/site-packages/torch/nn/modules/rnn.py:123: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.2 and num_layers=1 on3.10/site-packages/torch/nn/utils/weight_norm.py:143: FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`. | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0 | paths:131 | Searching for voice in path: /app/api/src/voices/v1_0 | model_manager:77 | Using default voice 'af_heart' for warmup | kokoro_v1:73 | Creating new pipeline for language code: a | kokoro_v1:245 | Generating audio for text with lang_code 'a': 'Warmup text for initialization.' | kokoro_v1:252 | Got audio chunk with shape: torch.Size([57600]) | model_manager:84 | Warmup completed in 1420ms up on cpu: kokoro_v1CUDA: False http://0.0.0.0:8880/web/ Uvicorn running on http://0.0.0.0:8880 (Press CTRL+C to quit) 127.0.0.1:60944 - "GET /health HTTP/1.1" 200 OK | openai_compatible:70 | Created global TTSService instance | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0 172.19.0.1:53728 - "POST /v1/audio/speech HTTP/1.1" 200 OK | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0 | openai_compatible:146 | Starting audio generation with lang_code: None | paths:131 | Searching for voice in path: /app/api/src/voices/v1_0 | tts_service:235 | Using single voice path: /app/api/src/voices/v1_0/af_bella.pt | tts_service:261 | Using voice path: /app/api/src/voices/v1_0/af_bella.pt | tts_service:265 | Using lang_code 'a' for voice 'af_bella' in audio stream | text_processor:131 | Starting smart split for 104 chars | text_processor:54 | Total processing took 15.69ms for chunk: 'She was a curious and adventurous child, with hair...' | text_processor:259 | Yielding final chunk 1: 'She was a curious and adventurous child, with hair...' (110 tokens) | kokoro_v1:245 | Generating audio for text with lang_code 'a': 'She was a curious and adventurous child, with hair as silver as the moon and eyes that shone like st...' | kokoro_v1:252 | Got audio chunk with shape: torch.Size([165000]) | text_processor:265 | Split completed in 1620.31ms, produced 1 chunks | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0 | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0 127.0.0.1:53832 - "GET /health HTTP/1.1" 200 OK | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0 | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0 | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0 172.19.0.1:53730 - "POST /v1/audio/speech HTTP/1.1" 200 OK | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0 172.19.0.1:53734 - "POST /v1/audio/speech HTTP/1.1" 200 OK | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0 172.19.0.1:53738 - "POST /v1/audio/speech HTTP/1.1" 200 OK | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0 172.19.0.1:53746 - "POST /v1/audio/speech HTTP/1.1" 200 OK | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0 172.19.0.1:53748 - "POST /v1/audio/speech HTTP/1.1" 200 OK | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0 | openai_compatible:146 | Starting audio generation with lang_code: None | paths:131 | Searching for voice in path: /app/api/src/voices/v1_0 | openai_compatible:146 | Starting audio generation with lang_code: None | paths:131 | Searching for voice in path: /app/api/src/voices/v1_0 | openai_compatible:146 | Starting audio generation with lang_code: None | paths:131 | Searching for voice in path: /app/api/src/voices/v1_0 | tts_service:235 | Using single voice path: /app/api/src/voices/v1_0/af_bella.pt | tts_service:261 | Using voice path: /app/api/src/voices/v1_0/af_bella.pt | tts_service:265 | Using lang_code 'a' for voice 'af_bella' in audio stream | text_processor:131 | Starting smart split for 100 chars | text_processor:54 | Total processing took 0.28ms for chunk: 'Luna lived with her wise and gentle grandmother, w...' | text_processor:259 | Yielding final chunk 1: 'Luna lived with her wise and gentle grandmother, w...' (104 tokens) | openai_compatible:146 | Starting audio generation with lang_code: None | paths:131 | Searching for voice in path: /app/api/src/voices/v1_0 | openai_compatible:146 | Starting audio generation with lang_code: None | paths:131 | Searching for voice in path: /app/api/src/voices/v1_0 | tts_service:235 | Using single voice path: /app/api/src/voices/v1_0/af_bella.pt | tts_service:261 | Using voice path: /app/api/src/voices/v1_0/af_bella.pt | tts_service:265 | Using lang_code 'a' for voice 'af_bella' in audio stream | text_processor:131 | Starting smart split for 83 chars | text_processor:54 | Total processing took 0.33ms for chunk: 'One evening, as they sat by the fire, Grandma told...' | text_processor:259 | Yielding final chunk 1: 'One evening, as they sat by the fire, Grandma told...' (89 tokens) | tts_service:235 | Using single voice path: /app/api/src/voices/v1_0/af_bella.pt | tts_service:261 | Using voice path: /app/api/src/voices/v1_0/af_bella.pt | tts_service:265 | Using lang_code 'a' for voice 'af_bella' in audio stream | text_processor:131 | Starting smart split for 149 chars | text_processor:54 | Total processing took 1.46ms for chunk: '"In a time long past," she said, "when the world w...' | text_processor:259 | Yielding final chunk 1: '"In a time long past," she said, "when the world w...' (150 tokens) | tts_service:235 | Using single voice path: /app/api/src/voices/v1_0/af_bella.pt | tts_service:261 | Using voice path: /app/api/src/voices/v1_0/af_bella.pt | tts_service:265 | Using lang_code 'a' for voice 'af_bella' in audio stream | text_processor:131 | Starting smart split for 138 chars | text_processor:54 | Total processing took 0.37ms for chunk: 'Its petals shone like moonlight, and its scent was...' | text_processor:259 | Yielding final chunk 1: 'Its petals shone like moonlight, and its scent was...' (148 tokens) | tts_service:235 | Using single voice path: /app/api/src/voices/v1_0/af_bella.pt | tts_service:261 | Using voice path: /app/api/src/voices/v1_0/af_bella.pt | tts_service:265 | Using lang_code 'a' for voice 'af_bella' in audio stream | text_processor:131 | Starting smart split for 50 chars | text_processor:54 | Total processing took 0.43ms for chunk: '"This is the essence of the Moonflower," she said.' | text_processor:259 | Yielding final chunk 1: '"This is the essence of the Moonflower," she said.' (45 tokens) | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0 | kokoro_v1:245 | Generating audio for text with lang_code 'a': 'Luna lived with her wise and gentle grandmother, who taught her the ancient stories of their people.' | kokoro_v1:252 | Got audio chunk with shape: torch.Size([151800]) 172.19.0.1:53762 - "POST /v1/audio/speech HTTP/1.1" 200 OK | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0 | kokoro_v1:245 | Generating audio for text with lang_code 'a': 'Its petals shone like moonlight, and its scent was sweet as honey." | kokoro_v1:252 | Got audio chunk with shape: torch.Size([106800]) | kokoro_v1:245 | Generating audio for text with lang_code 'a': 'One evening, as they sat by the fire, Grandma told Luna the tale of the Moonflower.' | kokoro_v1:252 | Got audio chunk with shape: torch.Size([139800]) | __main__:download_model:60 - Model files already exist and are valid Waiting for application startup. | main:57 | Loading TTS model and voice packs... | model_manager:38 | Initializing Kokoro V1 on cpu | paths:101 | Searching for model in path: /app/api/src/models | kokoro_v1:45 | Loading Kokoro model on cpu | kokoro_v1:46 | Config path: /app/api/src/models/v1_0/config.json | kokoro_v1:47 | Model path: /app/api/src/models/v1_0/kokoro-v1_0.pth on3.10/site-packages/torch/nn/modules/rnn.py:123: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.2 and num_layers=1 on3.10/site-packages/torch/nn/utils/weight_norm.py:143: FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`. | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0 | paths:131 | Searching for voice in path: /app/api/src/voices/v1_0 | model_manager:77 | Using default voice 'af_heart' for warmup | kokoro_v1:73 | Creating new pipeline for language code: a | kokoro_v1:245 | Generating audio for text with lang_code 'a': 'Warmup text for initialization.' | kokoro_v1:252 | Got audio chunk with shape: torch.Size([57600]) | model_manager:84 | Warmup completed in 1380ms up on cpu: kokoro_v1CUDA: False http://0.0.0.0:8880/web/ Uvicorn running on http://0.0.0.0:8880 (Press CTRL+C to quit) 127.0.0.1:36432 - "GET /health HTTP/1.1" 200 OK 127.0.0.1:58634 - "GET /health HTTP/1.1" 200 OK
                                    
                                        
                                            
                                            
                                                Apr 13
                                                '25  10:04
                                            
                                            
                                                deevilboy


                                                                    
                                    
                                        
I asked Gemini-2.5 to diagnose the error. Here is Gemini's response:
Okay, let's break down these latest logs:

Successful Startup & Warmup: The container starts at 10:22:16, loads the model, and completes warmup successfully by 10:22:24.

Network Confirmation: At 10:23:15, the first POST /v1/audio/speech request comes from 172.19.0.1. This confirms that kokoro is now on the same network (open-webui_openwebui-network, subnet 172.19.0.0/16) as the open-webui container (which had IP 172.19.0.2 in the previous docker inspect output). So, the network separation issue is resolved.

Concurrent Load: Immediately after the first request (10:23:15), a flood of concurrent POST /v1/audio/speech requests hits the server at 10:23:17. This often happens when a chatbot generates a multi-sentence response, and the UI tries to synthesize each sentence simultaneously.

Processing Under Load: The application starts processing multiple text chunks concurrently. It successfully generates audio for several of them:

10:23:17: Gets chunk torch.Size([165000]) ("She was a curious...")

10:23:18: Gets chunk torch.Size([151800]) ("Luna lived with her...")

10:23:19: Gets chunk torch.Size([106800]) ("Its petals shone like...")

10:23:20: Gets chunk torch.Size([139800]) ("One evening, as they...")

Crash: The logs abruptly stop after the last successful audio generation at 10:23:20. There are no further logs from that run.

Restart: The container restarts automatically at 10:23:22.

Conclusion:

The pattern is now very clear:

The kokoro container (arm64 image, 2g shm_size) runs fine under light load or during startup.

It crashes with Exit Code 139 (SIGSEGV) when subjected to multiple concurrent TTS requests on the arm64 platform.

It successfully processes some of the concurrent requests before crashing.

This strongly indicates a concurrency bug (like a race condition, memory corruption under parallel processing, or mishandling of shared resources) within the native arm64 code of Kokoro's TTS engine or its underlying dependencies (PyTorch, phonemizer, espeak-ng). These kinds of bugs are often platform-specific and only surface under stress/load.

Next Steps:

GitHub Issues (Highest Priority): Search the remsky/kokoro-fastapi GitHub issues specifically for terms related to concurrency, parallel, multiple requests, race condition in addition to arm64, SIGSEGV, 139. Someone else running on ARM under load may have hit this exact issue.

Reduce Concurrency (Workaround/Test): Check if Open WebUI has a setting to limit concurrent TTS requests or to synthesize sentences sequentially rather than in parallel. If not, try sending only short, single sentences from the chat to see if avoiding the concurrent flood prevents the crash. This helps confirm the concurrency trigger.

Report the Bug: If you find no existing relevant issues, this is valuable information for the developers. File a new issue on their GitHub repository detailing:

Host: arm64 (uname -m)

Image: ghcr.io/remsky/kokoro-fastapi-cpu (mention checking manifest)

Configuration: shm_size: '2g'

Symptom: Crashes with Exit Code 139 (SIGSEGV).

Trigger: Multiple concurrent /v1/audio/speech requests.

Logs: Provide the relevant log sections showing the concurrent requests and the abrupt end.

Unfortunately, this isn't likely something you can fix with simple configuration changes now that architecture and shared memory are ruled out. It points to a code-level issue in the arm64 version of the software under load.


                                    
                                    
                                        
                                            
                                            
                                                Apr 13
                                                '25  10:04
                                            
                                            
                                                deevilboy
                                            
                                        
                                    
                                    
                                
                                                                    
                                    
                                        
Iirc code 139 means the docker container ran out of memory. This is as far as I can tell not caused by kokoro fast API and seems to be caused by kokoro itself https://github.com/hexgrad/kokoro/issues/152

                                    
                                    
                                        
                                            
                                            
                                                Apr 14
                                                '25  12:04
                                            
                                            
                                                fireblade2534
                                            
                                        
                                    
                                    
                                
                                                                    
                                    
                                        
I have the same issue with a similar but not identical setup (M2 Max 96GB, docker resources set to max settings). Same 139 error under the exact same circumstances.

                                    
                                    
                                        
                                            
                                            
                                                Apr 19
                                                '25  04:04
                                            
                                            
                                                ngoldbla
                                            
                                        
                                    
                                    
                                
                                                                    
                                    
                                        
I've been able to get more stability with open-webui by setting the 'Response Splitting' option to 'Paragraphs' in Audio settings.
Haven't experienced this issue again after making the change.
Downside of course is extra pauses between kokoro responses, but it's been a decent experience.
When set to 'Punctuation' the requests get thrown to this kokoro api much quicker and that's what seems to trigger the sigsegv for whatever reason.

                                    
                                    
                                        
                                            
                                            
                                                May 23
                                                '25  02:05
                                            
                                            
                                                caustiq
                                            
                                        
                                    
                                    
                                
                                                                    
                                    
                                        
I can confirm this is also happening to me.  Same issue--using voice calling feature in open-webui.  Manually invoking the "speak" button on a single LLM response does not crash kokoro.

                                    
                                    
                                        
                                            
                                            
                                                Jun 11
                                                '25  16:06
                                            
                                            
                                                aaron-gray