Kokoro docker container crashes with exited code 139 (when using the open-webui "call" feature)
Describe the bug
The Kokoro container crashes with docker container exit code 139.
I'm using the open-webui "voice chat" feature (i.e. STT voice-to- TTS voice). The crash only occurs when using this "voice chat" feature.
If I manually click a button in open-webui to have kokoro read the LLM's text response, then there is no crash.
Note: When TTS LLM response is very short (i.e. 2-3 sentences), the container never crashes. However, when the TTS response is a long reply (~10+ sentences) the kokoro container crashes with docker container exit code 139.
Note2: The Kokoro web GUI (i.e. localhost:8880/web) works 100% -- it never crashes no matter how long the TTS response is
Operating System
- System:
- Macbook M3 Max (128G); arm64
- Docker:
- running on
Colimawith an allocation of 8GB shared memory and 8 CPU cores (out of 12)
- running on
- Image:
-
kokoro-fastapi-cpu:latest(arm64 version)
-
- Setup
- LLM front-end: using
open-webui(running on docker in same docker network as kokoro) - LLM backend: using
ollama(running natively on macbook -- i.e. not using docker)
- LLM front-end: using
Additional context
I have noticed this this error only occurs when using open-webui's call (voice-to-voice) feature.
On open-webui, If I skip using the 'callfeature and just manually press theRead aloud` button (which then calls kokoro for TTS), then the kokoro container never crashes no matter how long the text is.
I've posted the issue here because open-webui docker container never crashes -- only the kokoro-fastapi container crashes.
[my kokoro docker log file is below]
db@MPB16-M3 open-webui % docker logs kokoro
2025-04-13 10:22:16.367 | INFO | __main__:download_model:60 - Model files already exist and are valid
INFO: Started server process [11]
INFO: Waiting for application startup.
10:22:22 AM | INFO | main:57 | Loading TTS model and voice packs...
10:22:22 AM | INFO | model_manager:38 | Initializing Kokoro V1 on cpu
10:22:22 AM | DEBUG | paths:101 | Searching for model in path: /app/api/src/models
10:22:22 AM | INFO | kokoro_v1:45 | Loading Kokoro model on cpu
10:22:22 AM | INFO | kokoro_v1:46 | Config path: /app/api/src/models/v1_0/config.json
10:22:22 AM | INFO | kokoro_v1:47 | Model path: /app/api/src/models/v1_0/kokoro-v1_0.pth
/app/.venv/lib/python3.10/site-packages/torch/nn/modules/rnn.py:123: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.2 and num_layers=1
warnings.warn(
/app/.venv/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:143: FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`.
WeightNorm.apply(module, name, dim)
10:22:23 AM | DEBUG | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0
10:22:23 AM | DEBUG | paths:131 | Searching for voice in path: /app/api/src/voices/v1_0
10:22:23 AM | DEBUG | model_manager:77 | Using default voice 'af_heart' for warmup
10:22:23 AM | INFO | kokoro_v1:73 | Creating new pipeline for language code: a
10:22:23 AM | DEBUG | kokoro_v1:245 | Generating audio for text with lang_code 'a': 'Warmup text for initialization.'
10:22:24 AM | DEBUG | kokoro_v1:252 | Got audio chunk with shape: torch.Size([57600])
10:22:24 AM | INFO | model_manager:84 | Warmup completed in 1420ms
10:22:24 AM | INFO | main:101 |
░░░░░░░░░░░░░░░░░░░░░░░░
╔═╗┌─┐┌─┐┌┬┐
╠╣ ├─┤└─┐ │
╚ ┴ ┴└─┘ ┴
╦╔═┌─┐┬┌─┌─┐
╠╩╗│ │├┴┐│ │
╩ ╩└─┘┴ ┴└─┘
░░░░░░░░░░░░░░░░░░░░░░░░
Model warmed up on cpu: kokoro_v1CUDA: False
67 voice packs loaded
Beta Web Player: http://0.0.0.0:8880/web/
or http://localhost:8880/web/
░░░░░░░░░░░░░░░░░░░░░░░░
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8880 (Press CTRL+C to quit)
INFO: 127.0.0.1:60944 - "GET /health HTTP/1.1" 200 OK
10:23:15 AM | INFO | openai_compatible:70 | Created global TTSService instance
10:23:15 AM | DEBUG | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0
INFO: 172.19.0.1:53728 - "POST /v1/audio/speech HTTP/1.1" 200 OK
10:23:15 AM | DEBUG | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0
10:23:15 AM | INFO | openai_compatible:146 | Starting audio generation with lang_code: None
10:23:15 AM | DEBUG | paths:131 | Searching for voice in path: /app/api/src/voices/v1_0
10:23:15 AM | DEBUG | tts_service:235 | Using single voice path: /app/api/src/voices/v1_0/af_bella.pt
10:23:15 AM | DEBUG | tts_service:261 | Using voice path: /app/api/src/voices/v1_0/af_bella.pt
10:23:15 AM | INFO | tts_service:265 | Using lang_code 'a' for voice 'af_bella' in audio stream
10:23:15 AM | INFO | text_processor:131 | Starting smart split for 104 chars
10:23:15 AM | DEBUG | text_processor:54 | Total processing took 15.69ms for chunk: 'She was a curious and adventurous child, with hair...'
10:23:15 AM | INFO | text_processor:259 | Yielding final chunk 1: 'She was a curious and adventurous child, with hair...' (110 tokens)
10:23:15 AM | DEBUG | kokoro_v1:245 | Generating audio for text with lang_code 'a': 'She was a curious and adventurous child, with hair as silver as the moon and eyes that shone like st...'
10:23:17 AM | DEBUG | kokoro_v1:252 | Got audio chunk with shape: torch.Size([165000])
10:23:17 AM | INFO | text_processor:265 | Split completed in 1620.31ms, produced 1 chunks
10:23:17 AM | DEBUG | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0
10:23:17 AM | DEBUG | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0
INFO: 127.0.0.1:53832 - "GET /health HTTP/1.1" 200 OK
10:23:17 AM | DEBUG | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0
10:23:17 AM | DEBUG | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0
10:23:17 AM | DEBUG | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0
INFO: 172.19.0.1:53730 - "POST /v1/audio/speech HTTP/1.1" 200 OK
10:23:17 AM | DEBUG | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0
INFO: 172.19.0.1:53734 - "POST /v1/audio/speech HTTP/1.1" 200 OK
10:23:17 AM | DEBUG | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0
INFO: 172.19.0.1:53738 - "POST /v1/audio/speech HTTP/1.1" 200 OK
10:23:17 AM | DEBUG | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0
INFO: 172.19.0.1:53746 - "POST /v1/audio/speech HTTP/1.1" 200 OK
10:23:17 AM | DEBUG | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0
INFO: 172.19.0.1:53748 - "POST /v1/audio/speech HTTP/1.1" 200 OK
10:23:17 AM | DEBUG | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0
10:23:17 AM | INFO | openai_compatible:146 | Starting audio generation with lang_code: None
10:23:17 AM | DEBUG | paths:131 | Searching for voice in path: /app/api/src/voices/v1_0
10:23:17 AM | INFO | openai_compatible:146 | Starting audio generation with lang_code: None
10:23:17 AM | DEBUG | paths:131 | Searching for voice in path: /app/api/src/voices/v1_0
10:23:17 AM | INFO | openai_compatible:146 | Starting audio generation with lang_code: None
10:23:17 AM | DEBUG | paths:131 | Searching for voice in path: /app/api/src/voices/v1_0
10:23:17 AM | DEBUG | tts_service:235 | Using single voice path: /app/api/src/voices/v1_0/af_bella.pt
10:23:17 AM | DEBUG | tts_service:261 | Using voice path: /app/api/src/voices/v1_0/af_bella.pt
10:23:17 AM | INFO | tts_service:265 | Using lang_code 'a' for voice 'af_bella' in audio stream
10:23:17 AM | INFO | text_processor:131 | Starting smart split for 100 chars
10:23:17 AM | DEBUG | text_processor:54 | Total processing took 0.28ms for chunk: 'Luna lived with her wise and gentle grandmother, w...'
10:23:17 AM | INFO | text_processor:259 | Yielding final chunk 1: 'Luna lived with her wise and gentle grandmother, w...' (104 tokens)
10:23:17 AM | INFO | openai_compatible:146 | Starting audio generation with lang_code: None
10:23:17 AM | DEBUG | paths:131 | Searching for voice in path: /app/api/src/voices/v1_0
10:23:17 AM | INFO | openai_compatible:146 | Starting audio generation with lang_code: None
10:23:17 AM | DEBUG | paths:131 | Searching for voice in path: /app/api/src/voices/v1_0
10:23:17 AM | DEBUG | tts_service:235 | Using single voice path: /app/api/src/voices/v1_0/af_bella.pt
10:23:17 AM | DEBUG | tts_service:261 | Using voice path: /app/api/src/voices/v1_0/af_bella.pt
10:23:17 AM | INFO | tts_service:265 | Using lang_code 'a' for voice 'af_bella' in audio stream
10:23:17 AM | INFO | text_processor:131 | Starting smart split for 83 chars
10:23:17 AM | DEBUG | text_processor:54 | Total processing took 0.33ms for chunk: 'One evening, as they sat by the fire, Grandma told...'
10:23:17 AM | INFO | text_processor:259 | Yielding final chunk 1: 'One evening, as they sat by the fire, Grandma told...' (89 tokens)
10:23:17 AM | DEBUG | tts_service:235 | Using single voice path: /app/api/src/voices/v1_0/af_bella.pt
10:23:17 AM | DEBUG | tts_service:261 | Using voice path: /app/api/src/voices/v1_0/af_bella.pt
10:23:17 AM | INFO | tts_service:265 | Using lang_code 'a' for voice 'af_bella' in audio stream
10:23:17 AM | INFO | text_processor:131 | Starting smart split for 149 chars
10:23:17 AM | DEBUG | text_processor:54 | Total processing took 1.46ms for chunk: '"In a time long past," she said, "when the world w...'
10:23:17 AM | INFO | text_processor:259 | Yielding final chunk 1: '"In a time long past," she said, "when the world w...' (150 tokens)
10:23:17 AM | DEBUG | tts_service:235 | Using single voice path: /app/api/src/voices/v1_0/af_bella.pt
10:23:17 AM | DEBUG | tts_service:261 | Using voice path: /app/api/src/voices/v1_0/af_bella.pt
10:23:17 AM | INFO | tts_service:265 | Using lang_code 'a' for voice 'af_bella' in audio stream
10:23:17 AM | INFO | text_processor:131 | Starting smart split for 138 chars
10:23:17 AM | DEBUG | text_processor:54 | Total processing took 0.37ms for chunk: 'Its petals shone like moonlight, and its scent was...'
10:23:17 AM | INFO | text_processor:259 | Yielding final chunk 1: 'Its petals shone like moonlight, and its scent was...' (148 tokens)
10:23:17 AM | DEBUG | tts_service:235 | Using single voice path: /app/api/src/voices/v1_0/af_bella.pt
10:23:17 AM | DEBUG | tts_service:261 | Using voice path: /app/api/src/voices/v1_0/af_bella.pt
10:23:17 AM | INFO | tts_service:265 | Using lang_code 'a' for voice 'af_bella' in audio stream
10:23:17 AM | INFO | text_processor:131 | Starting smart split for 50 chars
10:23:17 AM | DEBUG | text_processor:54 | Total processing took 0.43ms for chunk: '"This is the essence of the Moonflower," she said.'
10:23:17 AM | INFO | text_processor:259 | Yielding final chunk 1: '"This is the essence of the Moonflower," she said.' (45 tokens)
10:23:17 AM | DEBUG | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0
10:23:17 AM | DEBUG | kokoro_v1:245 | Generating audio for text with lang_code 'a': 'Luna lived with her wise and gentle grandmother, who taught her the ancient stories of their people.'
10:23:18 AM | DEBUG | kokoro_v1:252 | Got audio chunk with shape: torch.Size([151800])
INFO: 172.19.0.1:53762 - "POST /v1/audio/speech HTTP/1.1" 200 OK
10:23:18 AM | DEBUG | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0
10:23:18 AM | DEBUG | kokoro_v1:245 | Generating audio for text with lang_code 'a': 'Its petals shone like moonlight, and its scent was sweet as honey."
Grandma handed Luna a small, del...'
10:23:19 AM | DEBUG | kokoro_v1:252 | Got audio chunk with shape: torch.Size([106800])
10:23:19 AM | DEBUG | kokoro_v1:245 | Generating audio for text with lang_code 'a': 'One evening, as they sat by the fire, Grandma told Luna the tale of the Moonflower.'
10:23:20 AM | DEBUG | kokoro_v1:252 | Got audio chunk with shape: torch.Size([139800])
2025-04-13 10:23:22.146 | INFO | __main__:download_model:60 - Model files already exist and are valid
INFO: Started server process [10]
INFO: Waiting for application startup.
10:23:25 AM | INFO | main:57 | Loading TTS model and voice packs...
10:23:25 AM | INFO | model_manager:38 | Initializing Kokoro V1 on cpu
10:23:25 AM | DEBUG | paths:101 | Searching for model in path: /app/api/src/models
10:23:25 AM | INFO | kokoro_v1:45 | Loading Kokoro model on cpu
10:23:25 AM | INFO | kokoro_v1:46 | Config path: /app/api/src/models/v1_0/config.json
10:23:25 AM | INFO | kokoro_v1:47 | Model path: /app/api/src/models/v1_0/kokoro-v1_0.pth
/app/.venv/lib/python3.10/site-packages/torch/nn/modules/rnn.py:123: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.2 and num_layers=1
warnings.warn(
/app/.venv/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:143: FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`.
WeightNorm.apply(module, name, dim)
10:23:25 AM | DEBUG | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0
10:23:25 AM | DEBUG | paths:131 | Searching for voice in path: /app/api/src/voices/v1_0
10:23:25 AM | DEBUG | model_manager:77 | Using default voice 'af_heart' for warmup
10:23:25 AM | INFO | kokoro_v1:73 | Creating new pipeline for language code: a
10:23:26 AM | DEBUG | kokoro_v1:245 | Generating audio for text with lang_code 'a': 'Warmup text for initialization.'
10:23:26 AM | DEBUG | kokoro_v1:252 | Got audio chunk with shape: torch.Size([57600])
10:23:26 AM | INFO | model_manager:84 | Warmup completed in 1380ms
10:23:26 AM | INFO | main:101 |
░░░░░░░░░░░░░░░░░░░░░░░░
╔═╗┌─┐┌─┐┌┬┐
╠╣ ├─┤└─┐ │
╚ ┴ ┴└─┘ ┴
╦╔═┌─┐┬┌─┌─┐
╠╩╗│ │├┴┐│ │
╩ ╩└─┘┴ ┴└─┘
░░░░░░░░░░░░░░░░░░░░░░░░
Model warmed up on cpu: kokoro_v1CUDA: False
67 voice packs loaded
Beta Web Player: http://0.0.0.0:8880/web/
or http://localhost:8880/web/
░░░░░░░░░░░░░░░░░░░░░░░░
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8880 (Press CTRL+C to quit)
INFO: 127.0.0.1:36432 - "GET /health HTTP/1.1" 200 OK
INFO: 127.0.0.1:58634 - "GET /health HTTP/1.1" 200 OK
I asked Gemini-2.5 to diagnose the error. Here is Gemini's response:
Okay, let's break down these latest logs:
Successful Startup & Warmup: The container starts at 10:22:16, loads the model, and completes warmup successfully by 10:22:24.
Network Confirmation: At 10:23:15, the first POST /v1/audio/speech request comes from 172.19.0.1. This confirms that kokoro is now on the same network (open-webui_openwebui-network, subnet 172.19.0.0/16) as the open-webui container (which had IP 172.19.0.2 in the previous docker inspect output). So, the network separation issue is resolved.
Concurrent Load: Immediately after the first request (10:23:15), a flood of concurrent POST /v1/audio/speech requests hits the server at 10:23:17. This often happens when a chatbot generates a multi-sentence response, and the UI tries to synthesize each sentence simultaneously.
Processing Under Load: The application starts processing multiple text chunks concurrently. It successfully generates audio for several of them:
10:23:17: Gets chunk torch.Size([165000]) ("She was a curious...")
10:23:18: Gets chunk torch.Size([151800]) ("Luna lived with her...")
10:23:19: Gets chunk torch.Size([106800]) ("Its petals shone like...")
10:23:20: Gets chunk torch.Size([139800]) ("One evening, as they...")
Crash: The logs abruptly stop after the last successful audio generation at 10:23:20. There are no further logs from that run.
Restart: The container restarts automatically at 10:23:22.
Conclusion:
The pattern is now very clear:
The kokoro container (arm64 image, 2g shm_size) runs fine under light load or during startup.
It crashes with Exit Code 139 (SIGSEGV) when subjected to multiple concurrent TTS requests on the arm64 platform.
It successfully processes some of the concurrent requests before crashing.
This strongly indicates a concurrency bug (like a race condition, memory corruption under parallel processing, or mishandling of shared resources) within the native arm64 code of Kokoro's TTS engine or its underlying dependencies (PyTorch, phonemizer, espeak-ng). These kinds of bugs are often platform-specific and only surface under stress/load.
Next Steps:
GitHub Issues (Highest Priority): Search the remsky/kokoro-fastapi GitHub issues specifically for terms related to concurrency, parallel, multiple requests, race condition in addition to arm64, SIGSEGV, 139. Someone else running on ARM under load may have hit this exact issue.
Reduce Concurrency (Workaround/Test): Check if Open WebUI has a setting to limit concurrent TTS requests or to synthesize sentences sequentially rather than in parallel. If not, try sending only short, single sentences from the chat to see if avoiding the concurrent flood prevents the crash. This helps confirm the concurrency trigger.
Report the Bug: If you find no existing relevant issues, this is valuable information for the developers. File a new issue on their GitHub repository detailing:
Host: arm64 (uname -m)
Image: ghcr.io/remsky/kokoro-fastapi-cpu (mention checking manifest)
Configuration: shm_size: '2g'
Symptom: Crashes with Exit Code 139 (SIGSEGV).
Trigger: Multiple concurrent /v1/audio/speech requests.
Logs: Provide the relevant log sections showing the concurrent requests and the abrupt end.
Unfortunately, this isn't likely something you can fix with simple configuration changes now that architecture and shared memory are ruled out. It points to a code-level issue in the arm64 version of the software under load.
Iirc code 139 means the docker container ran out of memory. This is as far as I can tell not caused by kokoro fast API and seems to be caused by kokoro itself https://github.com/hexgrad/kokoro/issues/152
I have the same issue with a similar but not identical setup (M2 Max 96GB, docker resources set to max settings). Same 139 error under the exact same circumstances.
I've been able to get more stability with open-webui by setting the 'Response Splitting' option to 'Paragraphs' in Audio settings. Haven't experienced this issue again after making the change. Downside of course is extra pauses between kokoro responses, but it's been a decent experience.
When set to 'Punctuation' the requests get thrown to this kokoro api much quicker and that's what seems to trigger the sigsegv for whatever reason.
I can confirm this is also happening to me. Same issue--using voice calling feature in open-webui. Manually invoking the "speak" button on a single LLM response does not crash kokoro.