Error: Error generating speech: Failed to execute 'endOfStream' on 'MediaSource': The 'updating' attribute is true on one or more of this MediaSource's SourceBuffers.
Describe the bug Was working great earlier now getting this error:
Error generating speech: Failed to execute 'endOfStream' on 'MediaSource': The 'updating' attribute is true on one or more of this MediaSource's SourceBuffers.
Branch / Deployment used Docker CPU quickstart version docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:v0.2.2
Operating System Debian
I have the same problem on windows.
Weird. It worked great all day yesterday but now this error popped up today on some of the voices. I wonder if the API is having issues?
@ItsNoted can you post the full logs?
I don't think that will help since the installation doesn't fail but here you go Windows PowerShell
docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:v0.2.2
Unable to find image 'ghcr.io/remsky/kokoro-fastapi-cpu:v0.2.2' locally
v0.2.2: Pulling from remsky/kokoro-fastapi-cpu
fac405487e7b: Pulling fs layer
fac405487e7b: Download complete
15cd46d12611: Download complete
d94653bc0bd7: Download complete
614c4f55d6a2: Download complete
31312498c845: Download complete
041a0f34698b: Download complete
406e12cf3cd9: Download complete
4f4fb700ef54: Already exists
87ccd60e8dd6: Download complete
43d164395a1c: Download complete
Digest: sha256:76549cce3c5cc5ed4089619a9cffc3d39a041476ff99c5138cd18b6da832c4d7
Status: Downloaded newer image for ghcr.io/remsky/kokoro-fastapi-cpu:v0.2.2
2025-02-18 14:42:39.528 | INFO | main:download_model:60 - Model files already exist and are valid
Building kokoro-fastapi @ file:///app
Built kokoro-fastapi @ file:///app
Uninstalled 1 package in 1ms
Installed 1 package in 1ms
INFO: Started server process [31]
INFO: Waiting for application startup.
02:42:54 PM | INFO | main:57 | Loading TTS model and voice packs...
02:42:54 PM | INFO | model_manager:38 | Initializing Kokoro V1 on cpu
02:42:54 PM | DEBUG | paths:101 | Searching for model in path: /app/api/src/models
02:42:54 PM | INFO | kokoro_v1:45 | Loading Kokoro model on cpu
02:42:54 PM | INFO | kokoro_v1:46 | Config path: /app/api/src/models/v1_0/config.json
02:42:54 PM | INFO | kokoro_v1:47 | Model path: /app/api/src/models/v1_0/kokoro-v1_0.pth
/app/.venv/lib/python3.10/site-packages/torch/nn/modules/rnn.py:123: UserWarning: dropout option adds drop
out after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dro
pout=0.2 and num_layers=1
warnings.warn(
/app/.venv/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:143: FutureWarning: torch.nn.utils. weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
WeightNorm.apply(module, name, dim)
02:42:55 PM | DEBUG | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0
02:42:55 PM | DEBUG | paths:131 | Searching for voice in path: /app/api/src/voices/v1_0
02:42:55 PM | DEBUG | model_manager:77 | Using default voice 'af_heart' for warmup
02:42:55 PM | INFO | kokoro_v1:73 | Creating new pipeline for language code: a
02:42:56 PM | DEBUG | kokoro_v1:244 | Generating audio for text with lang_code 'a': 'Warmup text for in
itialization.'
02:42:57 PM | DEBUG | kokoro_v1:251 | Got audio chunk with shape: torch.Size([57600])
02:42:57 PM | INFO | model_manager:84 | Warmup completed in 2782ms
02:42:57 PM | INFO | main:101 |
░░░░░░░░░░░░░░░░░░░░░░░░
╔═╗┌─┐┌─┐┌┬┐
╠╣ ├─┤└─┐ │
╚ ┴ ┴└─┘ ┴
╦╔═┌─┐┬┌─┌─┐
╠╩╗│ │├┴┐│ │
╩ ╩└─┘┴ ┴└─┘
░░░░░░░░░░░░░░░░░░░░░░░░
Model warmed up on cpu: kokoro_v1CUDA: False 67 voice packs loaded
Beta Web Player: http://0.0.0.0:8880/web/ or http://localhost:8880/web/ ░░░░░░░░░░░░░░░░░░░░░░░░
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8880 (Press CTRL+C to quit)
02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web
INFO: 172.17.0.1:43556 - "GET /web/ HTTP/1.1" 200 OK
02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web
02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web
02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web
02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web
02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web
02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web
INFO: 172.17.0.1:43556 - "GET /web/styles/base.css HTTP/1.1" 200 OK
INFO: 172.17.0.1:43594 - "GET /web/styles/forms.css HTTP/1.1" 200 OK
INFO: 172.17.0.1:43578 - "GET /web/styles/header.css HTTP/1.1" 200 OK
INFO: 172.17.0.1:43568 - "GET /web/styles/layout.css HTTP/1.1" 200 OK
INFO: 172.17.0.1:43596 - "GET /web/styles/player.css HTTP/1.1" 200 OK
INFO: 172.17.0.1:43610 - "GET /web/styles/responsive.css HTTP/1.1" 200 OK
02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web
INFO: 172.17.0.1:43556 - "GET /web/styles/badges.css HTTP/1.1" 200 OK
02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web
02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web
INFO: 172.17.0.1:43578 - "GET /web/styles/controls.css HTTP/1.1" 200 OK
INFO: 172.17.0.1:43594 - "GET /web/src/App.js HTTP/1.1" 200 OK
02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web
02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web
02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web
02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web
02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web
02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web
INFO: 172.17.0.1:43594 - "GET /web/src/services/VoiceService.js HTTP/1.1" 200 OK
INFO: 172.17.0.1:43596 - "GET /web/src/state/PlayerState.js HTTP/1.1" 200 OK
INFO: 172.17.0.1:43578 - "GET /web/src/services/AudioService.js HTTP/1.1" 200 OK
INFO: 172.17.0.1:43556 - "GET /web/src/components/PlayerControls.js HTTP/1.1" 200 OK
INFO: 172.17.0.1:43568 - "GET /web/src/components/WaveVisualizer.js HTTP/1.1" 200 OK
INFO: 172.17.0.1:43610 - "GET /web/src/components/VoiceSelector.js HTTP/1.1" 200 OK
02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web
INFO: 172.17.0.1:43594 - "GET /web/src/components/TextEditor.js HTTP/1.1" 200 OK
02:43:54 PM | INFO | openai_compatible:65 | Created global TTSService instance
02:43:54 PM | DEBUG | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0
INFO: 172.17.0.1:43578 - "GET /v1/audio/voices HTTP/1.1" 200 OK
02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web
02:44:10 PM | DEBUG | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0
INFO: 172.17.0.1:45700 - "POST /v1/audio/speech HTTP/1.1" 200 OK
02:44:10 PM | DEBUG | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0
02:44:10 PM | INFO | openai_compatible:135 | Starting audio generation with lang_code: None
02:44:10 PM | DEBUG | paths:131 | Searching for voice in path: /app/api/src/voices/v1_0
02:44:10 PM | DEBUG | tts_service:228 | Using single voice path: /app/api/src/voices/v1_0/af_alloy.pt
02:44:10 PM | DEBUG | tts_service:253 | Using voice path: /app/api/src/voices/v1_0/af_alloy.pt
02:44:10 PM | INFO | tts_service:257 | Using lang_code 'a' for voice 'af_alloy' in audio stream
02:44:10 PM | INFO | text_processor:114 | Starting smart split for 6 chars
02:44:10 PM | DEBUG | text_processor:51 | Total processing took 16.50ms for chunk: 'Testt!'
02:44:10 PM | INFO | text_processor:236 | Yielding final chunk 1: 'Testt!' (6 tokens)
02:44:10 PM | DEBUG | kokoro_v1:244 | Generating audio for text with lang_code 'a': 'Testt!'
02:44:11 PM | DEBUG | kokoro_v1:251 | Got audio chunk with shape: torch.Size([28200])
02:44:11 PM | INFO | text_processor:242 | Split completed in 686.53ms, produced 1 chunks
02:44:17 PM | DEBUG | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0
INFO: 172.17.0.1:51710 - "POST /v1/audio/speech HTTP/1.1" 200 OK
02:44:17 PM | DEBUG | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0
02:44:17 PM | INFO | openai_compatible:135 | Starting audio generation with lang_code: None
02:44:17 PM | DEBUG | paths:131 | Searching for voice in path: /app/api/src/voices/v1_0
02:44:17 PM | DEBUG | tts_service:228 | Using single voice path: /app/api/src/voices/v1_0/af_alloy.pt
02:44:17 PM | DEBUG | tts_service:253 | Using voice path: /app/api/src/voices/v1_0/af_alloy.pt
02:44:17 PM | INFO | tts_service:257 | Using lang_code 'a' for voice 'af_alloy' in audio stream
02:44:17 PM | INFO | text_processor:114 | Starting smart split for 6 chars
02:44:17 PM | DEBUG | text_processor:51 | Total processing took 0.46ms for chunk: 'Testt!'
02:44:17 PM | INFO | text_processor:236 | Yielding final chunk 1: 'Testt!' (6 tokens)
02:44:17 PM | DEBUG | kokoro_v1:244 | Generating audio for text with lang_code 'a': 'Testt!'
02:44:18 PM | DEBUG | kokoro_v1:251 | Got audio chunk with shape: torch.Size([28200])
02:44:18 PM | INFO | text_processor:242 | Split completed in 507.60ms, produced 1 chunks
Mine was very similar. I no longer have the logs as I have turned off the container for now. I cannot seem to reproduce it again either. Strange series of events.
Interestingly when I try to built the whole thing from scratch it errors out.
cpu docker compose up --build [+] Building 1.6s (18/18) FINISHED docker:desktop-linux => [kokoro-tts internal] load build definition from Dockerfile 0.0s => => transferring dockerfile: 1.83kB 0.0s => [kokoro-tts internal] load metadata for docker.io/library/python:3.10-slim 1.3s => [kokoro-tts internal] load .dockerignore 0.0s => => transferring context: 407B 0.0s => [kokoro-tts stage-0 1/12] FROM docker.io/library/python:3.10-slim@sha256:66aad90b231f011cb80e1 0.0s => => resolve docker.io/library/python:3.10-slim@sha256:66aad90b231f011cb80e1966e03526a7175f058672 0.0s => [kokoro-tts internal] load build context 0.0s => => transferring context: 7.69kB 0.0s => CACHED [kokoro-tts stage-0 2/12] RUN apt-get update && apt-get install -y espeak-ng es 0.0s => CACHED [kokoro-tts stage-0 3/12] RUN curl -LsSf https://astral.sh/uv/install.sh | sh && mv 0.0s => CACHED [kokoro-tts stage-0 4/12] RUN useradd -m -u 1000 appuser && mkdir -p /app/api/src/m 0.0s => CACHED [kokoro-tts stage-0 5/12] WORKDIR /app 0.0s => CACHED [kokoro-tts stage-0 6/12] COPY --chown=appuser:appuser pyproject.toml ./pyproject.toml 0.0s => CACHED [kokoro-tts stage-0 7/12] RUN --mount=type=cache,target=/root/.cache/uv uv venv --p 0.0s => CACHED [kokoro-tts stage-0 8/12] COPY --chown=appuser:appuser api ./api 0.0s => CACHED [kokoro-tts stage-0 9/12] COPY --chown=appuser:appuser web ./web 0.0s => CACHED [kokoro-tts stage-0 10/12] COPY --chown=appuser:appuser docker/scripts/ ./ 0.0s => CACHED [kokoro-tts stage-0 11/12] RUN chmod +x ./entrypoint.sh 0.0s => CACHED [kokoro-tts stage-0 12/12] RUN if [ "true" = "true" ]; then python download_model.py 0.0s => [kokoro-tts] exporting to image 0.1s => => exporting layers 0.0s => => exporting manifest sha256:9e328694d0e783ffb7b500f305800f96f6ca3923603be360ad915839203a8f82 0.0s => => exporting config sha256:9e474636700e2d5226f15aa3f4a85a221cd78df1000a78ad86cc178bb4ff42e2 0.0s => => exporting attestation manifest sha256:1d74c280e98a50efeb21f0b79175dea2bceb51bc69ffea2863b434 0.0s => => exporting manifest list sha256:b35250eb6432e3c2fc14f03bb7a4c2d08ec741db9d63e43fcda696b69aea2 0.0s => => naming to docker.io/library/kokoro-fastapi-cpu-kokoro-tts:latest 0.0s => => unpacking to docker.io/library/kokoro-fastapi-cpu-kokoro-tts:latest 0.0s => [kokoro-tts] resolving provenance for metadata file 0.0s [+] Running 3/3 ✔ kokoro-ttsBuilt0.0s ✔ Network kokoro-fastapi-cpu_defaultCreated0.3s ✔ Container kokoro-fastapi-cpu-kokoro-tts-1 Created0.1s Attaching to kokoro-tts-1 kokoro-tts-1 | exec ./entrypoint.sh: no such file or directory kokoro-tts-1 exited with code 1
I don't think that will help since the installation doesn't fail but here you go Windows PowerShell
docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:v0.2.2 Unable to find image 'ghcr.io/remsky/kokoro-fastapi-cpu:v0.2.2' locally v0.2.2: Pulling from remsky/kokoro-fastapi-cpu fac405487e7b: Pulling fs layer fac405487e7b: Download complete 15cd46d12611: Download complete d94653bc0bd7: Download complete 614c4f55d6a2: Download complete 31312498c845: Download complete 041a0f34698b: Download complete 406e12cf3cd9: Download complete 4f4fb700ef54: Already exists 87ccd60e8dd6: Download complete 43d164395a1c: Download complete Digest: sha256:76549cce3c5cc5ed4089619a9cffc3d39a041476ff99c5138cd18b6da832c4d7 Status: Downloaded newer image for ghcr.io/remsky/kokoro-fastapi-cpu:v0.2.2 2025-02-18 14:42:39.528 | INFO | main:download_model:60 - Model files already exist and are valid Building kokoro-fastapi @ file:///app Built kokoro-fastapi @ file:///app Uninstalled 1 package in 1ms Installed 1 package in 1ms INFO: Started server process [31] INFO: Waiting for application startup. 02:42:54 PM | INFO | main:57 | Loading TTS model and voice packs... 02:42:54 PM | INFO | model_manager:38 | Initializing Kokoro V1 on cpu 02:42:54 PM | DEBUG | paths:101 | Searching for model in path: /app/api/src/models 02:42:54 PM | INFO | kokoro_v1:45 | Loading Kokoro model on cpu 02:42:54 PM | INFO | kokoro_v1:46 | Config path: /app/api/src/models/v1_0/config.json 02:42:54 PM | INFO | kokoro_v1:47 | Model path: /app/api/src/models/v1_0/kokoro-v1_0.pth /app/.venv/lib/python3.10/site-packages/torch/nn/modules/rnn.py:123: UserWarning: dropout option adds drop out after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dro pout=0.2 and num_layers=1 warnings.warn( /app/.venv/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:143: FutureWarning:
torch.nn.utils. weight_normis deprecated in favor oftorch.nn.utils.parametrizations.weight_norm. WeightNorm.apply(module, name, dim) 02:42:55 PM | DEBUG | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0 02:42:55 PM | DEBUG | paths:131 | Searching for voice in path: /app/api/src/voices/v1_0 02:42:55 PM | DEBUG | model_manager:77 | Using default voice 'af_heart' for warmup 02:42:55 PM | INFO | kokoro_v1:73 | Creating new pipeline for language code: a 02:42:56 PM | DEBUG | kokoro_v1:244 | Generating audio for text with lang_code 'a': 'Warmup text for in itialization.' 02:42:57 PM | DEBUG | kokoro_v1:251 | Got audio chunk with shape: torch.Size([57600]) 02:42:57 PM | INFO | model_manager:84 | Warmup completed in 2782ms 02:42:57 PM | INFO | main:101 |░░░░░░░░░░░░░░░░░░░░░░░░
╔═╗┌─┐┌─┐┌┬┐ ╠╣ ├─┤└─┐ │ ╚ ┴ ┴└─┘ ┴ ╦╔═┌─┐┬┌─┌─┐ ╠╩╗│ │├┴┐│ │ ╩ ╩└─┘┴ ┴└─┘░░░░░░░░░░░░░░░░░░░░░░░░
Model warmed up on cpu: kokoro_v1CUDA: False 67 voice packs loaded
Beta Web Player: http://0.0.0.0:8880/web/ or http://localhost:8880/web/ ░░░░░░░░░░░░░░░░░░░░░░░░
INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8880 (Press CTRL+C to quit) 02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web INFO: 172.17.0.1:43556 - "GET /web/ HTTP/1.1" 200 OK 02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web 02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web 02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web 02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web 02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web 02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web INFO: 172.17.0.1:43556 - "GET /web/styles/base.css HTTP/1.1" 200 OK INFO: 172.17.0.1:43594 - "GET /web/styles/forms.css HTTP/1.1" 200 OK INFO: 172.17.0.1:43578 - "GET /web/styles/header.css HTTP/1.1" 200 OK INFO: 172.17.0.1:43568 - "GET /web/styles/layout.css HTTP/1.1" 200 OK INFO: 172.17.0.1:43596 - "GET /web/styles/player.css HTTP/1.1" 200 OK INFO: 172.17.0.1:43610 - "GET /web/styles/responsive.css HTTP/1.1" 200 OK 02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web INFO: 172.17.0.1:43556 - "GET /web/styles/badges.css HTTP/1.1" 200 OK 02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web 02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web INFO: 172.17.0.1:43578 - "GET /web/styles/controls.css HTTP/1.1" 200 OK INFO: 172.17.0.1:43594 - "GET /web/src/App.js HTTP/1.1" 200 OK 02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web 02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web 02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web 02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web 02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web 02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web INFO: 172.17.0.1:43594 - "GET /web/src/services/VoiceService.js HTTP/1.1" 200 OK INFO: 172.17.0.1:43596 - "GET /web/src/state/PlayerState.js HTTP/1.1" 200 OK INFO: 172.17.0.1:43578 - "GET /web/src/services/AudioService.js HTTP/1.1" 200 OK INFO: 172.17.0.1:43556 - "GET /web/src/components/PlayerControls.js HTTP/1.1" 200 OK INFO: 172.17.0.1:43568 - "GET /web/src/components/WaveVisualizer.js HTTP/1.1" 200 OK INFO: 172.17.0.1:43610 - "GET /web/src/components/VoiceSelector.js HTTP/1.1" 200 OK 02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web INFO: 172.17.0.1:43594 - "GET /web/src/components/TextEditor.js HTTP/1.1" 200 OK 02:43:54 PM | INFO | openai_compatible:65 | Created global TTSService instance 02:43:54 PM | DEBUG | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0 INFO: 172.17.0.1:43578 - "GET /v1/audio/voices HTTP/1.1" 200 OK 02:43:54 PM | DEBUG | paths:307 | Searching for web file in path: /app/web 02:44:10 PM | DEBUG | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0 INFO: 172.17.0.1:45700 - "POST /v1/audio/speech HTTP/1.1" 200 OK 02:44:10 PM | DEBUG | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0 02:44:10 PM | INFO | openai_compatible:135 | Starting audio generation with lang_code: None 02:44:10 PM | DEBUG | paths:131 | Searching for voice in path: /app/api/src/voices/v1_0 02:44:10 PM | DEBUG | tts_service:228 | Using single voice path: /app/api/src/voices/v1_0/af_alloy.pt 02:44:10 PM | DEBUG | tts_service:253 | Using voice path: /app/api/src/voices/v1_0/af_alloy.pt 02:44:10 PM | INFO | tts_service:257 | Using lang_code 'a' for voice 'af_alloy' in audio stream 02:44:10 PM | INFO | text_processor:114 | Starting smart split for 6 chars 02:44:10 PM | DEBUG | text_processor:51 | Total processing took 16.50ms for chunk: 'Testt!' 02:44:10 PM | INFO | text_processor:236 | Yielding final chunk 1: 'Testt!' (6 tokens) 02:44:10 PM | DEBUG | kokoro_v1:244 | Generating audio for text with lang_code 'a': 'Testt!' 02:44:11 PM | DEBUG | kokoro_v1:251 | Got audio chunk with shape: torch.Size([28200]) 02:44:11 PM | INFO | text_processor:242 | Split completed in 686.53ms, produced 1 chunks 02:44:17 PM | DEBUG | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0 INFO: 172.17.0.1:51710 - "POST /v1/audio/speech HTTP/1.1" 200 OK 02:44:17 PM | DEBUG | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0 02:44:17 PM | INFO | openai_compatible:135 | Starting audio generation with lang_code: None 02:44:17 PM | DEBUG | paths:131 | Searching for voice in path: /app/api/src/voices/v1_0 02:44:17 PM | DEBUG | tts_service:228 | Using single voice path: /app/api/src/voices/v1_0/af_alloy.pt 02:44:17 PM | DEBUG | tts_service:253 | Using voice path: /app/api/src/voices/v1_0/af_alloy.pt 02:44:17 PM | INFO | tts_service:257 | Using lang_code 'a' for voice 'af_alloy' in audio stream 02:44:17 PM | INFO | text_processor:114 | Starting smart split for 6 chars 02:44:17 PM | DEBUG | text_processor:51 | Total processing took 0.46ms for chunk: 'Testt!' 02:44:17 PM | INFO | text_processor:236 | Yielding final chunk 1: 'Testt!' (6 tokens) 02:44:17 PM | DEBUG | kokoro_v1:244 | Generating audio for text with lang_code 'a': 'Testt!' 02:44:18 PM | DEBUG | kokoro_v1:251 | Got audio chunk with shape: torch.Size([28200]) 02:44:18 PM | INFO | text_processor:242 | Split completed in 507.60ms, produced 1 chunks
I presume the error happened right after ur last log?
Interestingly when I try to built the whole thing from scratch it errors out.
cpu docker compose up --build [+] Building 1.6s (18/18) FINISHED docker:desktop-linux => [kokoro-tts internal] load build definition from Dockerfile 0.0s => => transferring dockerfile: 1.83kB 0.0s => [kokoro-tts internal] load metadata for docker.io/library/python:3.10-slim 1.3s => [kokoro-tts internal] load .dockerignore 0.0s => => transferring context: 407B 0.0s => [kokoro-tts stage-0 1/12] FROM docker.io/library/python:3.10-slim@sha256:66aad90b231f011cb80e1 0.0s => => resolve docker.io/library/python:3.10-slim@sha256:66aad90b231f011cb80e1966e03526a7175f058672 0.0s => [kokoro-tts internal] load build context 0.0s => => transferring context: 7.69kB 0.0s => CACHED [kokoro-tts stage-0 2/12] RUN apt-get update && apt-get install -y espeak-ng es 0.0s => CACHED [kokoro-tts stage-0 3/12] RUN curl -LsSf https://astral.sh/uv/install.sh | sh && mv 0.0s => CACHED [kokoro-tts stage-0 4/12] RUN useradd -m -u 1000 appuser && mkdir -p /app/api/src/m 0.0s => CACHED [kokoro-tts stage-0 5/12] WORKDIR /app 0.0s => CACHED [kokoro-tts stage-0 6/12] COPY --chown=appuser:appuser pyproject.toml ./pyproject.toml 0.0s => CACHED [kokoro-tts stage-0 7/12] RUN --mount=type=cache,target=/root/.cache/uv uv venv --p 0.0s => CACHED [kokoro-tts stage-0 8/12] COPY --chown=appuser:appuser api ./api 0.0s => CACHED [kokoro-tts stage-0 9/12] COPY --chown=appuser:appuser web ./web 0.0s => CACHED [kokoro-tts stage-0 10/12] COPY --chown=appuser:appuser docker/scripts/ ./ 0.0s => CACHED [kokoro-tts stage-0 11/12] RUN chmod +x ./entrypoint.sh 0.0s => CACHED [kokoro-tts stage-0 12/12] RUN if [ "true" = "true" ]; then python download_model.py 0.0s => [kokoro-tts] exporting to image 0.1s => => exporting layers 0.0s => => exporting manifest sha256:9e328694d0e783ffb7b500f305800f96f6ca3923603be360ad915839203a8f82 0.0s => => exporting config sha256:9e474636700e2d5226f15aa3f4a85a221cd78df1000a78ad86cc178bb4ff42e2 0.0s => => exporting attestation manifest sha256:1d74c280e98a50efeb21f0b79175dea2bceb51bc69ffea2863b434 0.0s => => exporting manifest list sha256:b35250eb6432e3c2fc14f03bb7a4c2d08ec741db9d63e43fcda696b69aea2 0.0s => => naming to docker.io/library/kokoro-fastapi-cpu-kokoro-tts:latest 0.0s => => unpacking to docker.io/library/kokoro-fastapi-cpu-kokoro-tts:latest 0.0s => [kokoro-tts] resolving provenance for metadata file 0.0s [+] Running 3/3 ✔ kokoro-ttsBuilt0.0s ✔ Network kokoro-fastapi-cpu_defaultCreated0.3s ✔ Container kokoro-fastapi-cpu-kokoro-tts-1 Created0.1s Attaching to kokoro-tts-1 kokoro-tts-1 | exec ./entrypoint.sh: no such file or directory kokoro-tts-1 exited with code 1
This is likly unrelated if you are on windows its probably because GitHub tries to correct line endings and it messes with its ability to find the file for some reason.
Try running this (Make it so all repo will use Linux line endings): git config --global core.autocrlf false then this (or redownload the repo): git add --renormalize .
Okay regarding your first queastion I did it on a completely new installation of docker. I cleared my console before running the comand and what you see is the log file that resulted from that. I'm on windows but run docker through WSL 2. Next question how do I do what you described?
Okay regarding your first queastion I did it on a completely new installation of docker. I cleared my console before running the comand and what you see is the log file that resulted from that.
So where does the "Error generating speech: Failed to execute 'endOfStream' on 'MediaSource': The 'updating' attribute is true on one or more of this MediaSource's SourceBuffers." come from them?
I'm on windows but run docker through WSL 2. Next question how do I do what you described?
Go into the folder that you cloned onto your machine and execute the commands in your wsl console although if u want the line ending thing on your windows machine execute the first command on your windows terminal as well.
Okay regarding your first queastion I did it on a completely new installation of docker. I cleared my console before running the comand and what you see is the log file that resulted from that.
So where does the "Error generating speech: Failed to execute 'endOfStream' on 'MediaSource': The 'updating' attribute is true on one or more of this MediaSource's SourceBuffers." come from them?
The webinterface loads and as soon as I try to generate an audio file this error pops up.
I'm on windows but run docker through WSL 2. Next question how do I do what you described?
Go into the folder that you cloned onto your machine and execute the commands in your wsl console although if u want the line ending thing on your windows machine execute the first command on your windows terminal as well.
That's exactly what I did. I ran docker opened a console and executed the comand directly in the cloned git folder. WSL was installed as part of my docker installation. I don't know what you mean by line ending.
Okay regarding your first queastion I did it on a completely new installation of docker. I cleared my console before running the comand and what you see is the log file that resulted from that.
So where does the "Error generating speech: Failed to execute 'endOfStream' on 'MediaSource': The 'updating' attribute is true on one or more of this MediaSource's SourceBuffers." come from them?
The webinterface loads and as soon as I try to generate an audio file this error pops up.
so it pops up on the webui then?
I'm on windows but run docker through WSL 2. Next question how do I do what you described?
Go into the folder that you cloned onto your machine and execute the commands in your wsl console although if u want the line ending thing on your windows machine execute the first command on your windows terminal as well.
That's exactly what I did. I ran docker opened a console and executed the comand directly in the cloned git folder. WSL was installed as part of my docker installation. I don't know what you mean by line ending.
ok and does building it work? A line ending is how operating systems represent that a new line. On windows its "\r\n" and on linux its "\n"
Yes it pops up on the webui. It does not build if you execute the comands shown on the readme page. I still don't quite understand what you are trying to tell me about lines. Is that just some formatting thing? I can only refer you to the log above. The build fails with kokoro-tts-1 exited with code 1.
ok. What I mean by the line endings is that I had a similar issue to you where it wasn't able to find entrypoint.sh It turns out that git hub was changing Linux line endings to Windows ones and it was producing the same error. In regards to my question I "ok and does building it work?" I meant after executing the commands I gave u did it work. Also are you using the GPU container or the CPU container
Okay using your comands I was able to build everything from source but the error in the Webui remains. I'm using the CPU version.
Okay using your comands I was able to build everything from source but the error in the Webui remains. I'm using the CPU version.
What browser and generation settings are you using?
Thank you for the question. It was a Firefox problem! In Vivaldi it works as expected.
Currently running into the same issue. Have tried 3 browsers(Chrome,Edge,FF).
Installed via docker run GPU using (docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:v0.2.2 #NVIDIA GPU"
I can connect to the web interface. Upon clicking Generate speech i get this error "Error generating speech: MediaSource.addSourceBuffer: Type not supported in MediaSource"
changing the text uploaded, model, file type doesnt seem to do anything for me.
yeah its not releated to browser I don't think cause I checked and I'm getting it too
Currently running into the same issue. Have tried 3 browsers(Chrome,Edge,FF).
Installed via docker run GPU using (docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:v0.2.2 #NVIDIA GPU"
I can connect to the web interface. Upon clicking Generate speech i get this error "Error generating speech: MediaSource.addSourceBuffer: Type not supported in MediaSource"
changing the text uploaded, model, file type doesnt seem to do anything for me.
I wouldn't know about the GPU version since I use the CPU Before doing anything else try installing Vivaldi I'm 99% sure that this will work. If not it is always a good idea to build from source. I suspect if you are on Windows you will run into the same problem I did Here are the comands that should work to build it.
git config --global core.autocrlf false git clone https://github.com/remsky/Kokoro-FastAPI.git cd Kokoro-FastAPI cd docker/gpu docker compose up --build
Make sure you delete any previous downloads of the Kokoro Repository you might have downloaded.
Currently running into the same issue. Have tried 3 browsers(Chrome,Edge,FF). Installed via docker run GPU using (docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:v0.2.2 #NVIDIA GPU" I can connect to the web interface. Upon clicking Generate speech i get this error "Error generating speech: MediaSource.addSourceBuffer: Type not supported in MediaSource" changing the text uploaded, model, file type doesnt seem to do anything for me.
I wouldn't know about the GPU version since I use the CPU Before doing anything else try installing Vivaldi I'm 99% sure that this will work. If not it is always a good idea to build from source. I suspect if you are on Windows you will run into the same problem I did Here are the comands that should work to build it.
git config --global core.autocrlf false git clone https://github.com/remsky/Kokoro-FastAPI.git cd Kokoro-FastAPI cd docker/gpu docker compose up --build
Make sure you delete any previous downloads of the Kokoro Repository you might have downloaded.
Your issue that I helped u with had nothing to do with the main issue of the thread your issue was that the docker build could not find ./entrypoint.sh
Yes I know but I've also learned that building things from scratch often helps with such problems.
Still errors. How can I fixed this error? Thanks.
What error are you getting and are you using the cpu or the GPU build?
I got this error as well while testing the ui. Running docker with the cpu version on Windows. I noticed that the error occurs in Firefox (135.0.1 (64-bit)) but Chrome runs this without issues.
What error are you getting and are you using the cpu or the GPU build? I'm using GPU, I get error with message "Error generating speech: Failed to execute 'endOfStream' on 'MediaSource': The 'updating' attribute is true on one or more of this MediaSource's SourceBuffers." on all browsers when using http://localhost:8880/web
It's a Firefox issue. I can confirm that the programme works in Vivaldi as well as Brave.
Old version worked in firefox but with latest version getting the mediasource error. Works i chrome tho. Using CPU docker container.