Write an Open WebUI integration tutorial
https://docs.openwebui.com/category/-integrations
If someone wants to take this on please let me know
the current link for tutorials seems to be https://docs.openwebui.com/category/-integrations, right?
https://docs.openwebui.com/category/-integrations
Yes. I've updated the original description.
I can do this as soon as I get it working. Speech to Text (STT) is working perfectly, but I keep getting a "Model not found" for Text to Speech (TTS).
Edit: I got it working. I can write up an integration guide if it needs to be done. Heading to sleep. Will probably tackle it next week sometime. I just got done tying Open-WebUI to vLLM (for concurrent generation) & Speaches (All-in-one STT/TTS), and it's working very well so far. I just need to tie in an image generator and OCR and I'll be writing a master integration guide.
But I can write an article first... after sleep and some brain rest.
I can do this as soon as I get it working. Speech to Text (STT) is working perfectly, but I keep getting a "Model not found" for Text to Speech (TTS).
Edit: I got it working. I can write up an integration guide if it needs to be done. Heading to sleep. Will probably tackle it next week sometime. I just got done tying Open-WebUI to vLLM (for concurrent generation) & Speaches (All-in-one STT/TTS), and it's working very well so far. I just need to tie in an image generator and OCR and I'll be writing a master integration guide.
But I can write an article first... after sleep and some brain rest.
Any news?
@fedirz @pschaer @m3lander
Below is what I have so far. It needs more testing, I don't feel it's production ready yet, but it might get someone 90% there.
You also have to download the models manually for STT before it will completely work: sudo docker exec -it openwebui-speaches-1 huggingface-cli download rhasspy/piper-voices --include 'en/**/*' 'voices.json'
I'll try to find time to polish it off but if someone else wants to finish it before I have time.... please do so! I have it in word, but it won't let me upload a docx to github.
One issue I have is I know now that what's below is wrong for volumes (uses /home/ubuntu/.cache/huggingface), so that should be the volume, but I got permissions errors when I ran it last time. I have not figured out how to mitigated that yet ...
I think to make production ready:
- I'd need to figure out the permissions for volume issue (issue with huggingface cache being root vs ubuntu user, I ran chmod 777 on huggingface cache locally).
- I'd need manually specify the command in the compose.yaml an add the model download (huggingface cache needs to be persistent first so it doesn't download each time).
- I'd need to figure out a volume map that isn't hardcoded to my user, was playing with /data/speaches-ai but with a sudo docker compose up, it's created as root).
- I'd need to test it from fresh to make certain everything works as expected.
I also think I had CHAT_COMPLETION_BASE_URL set to my local vLLM instance when I had it working, I didn't think it was used or required so I took it out, but it's an untested configuration change. Basically, yes I have made progress, yes I have it working internally, but I haven't solved all the issues and it requires more steps.
Speaches.ai Open-WebUI Integration Guide
The purpose of this guide is to help integrate speeches with Open-WebUI. Moving the processing outside of the Open-WebUI container significantly lowers the CPU utilization and improves UI responsiveness of Open-WebUI. It also allows ability to CUDA accelerating both STT & TTS independently of Open-WebUI.
Open-WebUI Deployment and Configuration
Open-WebUI settings are stored in a database, meaning that after deployment the settings persist to whatever the initial default was, or what was provided via environmental variables. Upon initial deployment of Open-WebUI it is possible to already be setup to use the speaches.ai container.
Pre-deployment
Docker Compose
Add to your Open-WebUI compose.yaml file
services:
open-webui:
…
environment:
- AUDIO_STT_ENGINE=openai
- AUDIO_STT_MODEL=Systran/faster-distil-whisper-small.en
- AUDIO_STT_OPENAI_API_BASE_URL=http://speaches:8000/v1
- AUDIO_STT_OPENAI_API_KEY=<YOUR_OPEN-WEBUI_API_KEY>
- AUDIO_TTS_ENGINE=openai
- AUDIO_TTS_MODEL=rhasspy/piper-voices
- AUDIO_TTS_OPENAI_API_BASE_URL=http://speaches:8000/v1
- AUDIO_TTS_OPENAI_API_KEY=<YOUR_OPEN-WEBUI_API_KEY>
- AUDIO_TTS_VOICE=en_US-amy-medium
speaches:
image: "ghcr.io/speaches-ai/speaches:latest-cuda"
restart: unless-stopped
runtime: nvidia
volumes:
- /home/<YOUR_USERNAME>/.cache/huggingface:/root/.cache/huggingface
environment:
- NVIDIA_VISIBLE_DEVICES=all
- HUGGING_FACE_HUB_TOKEN=<YOUR_HUGGINGFACE_TOKEN>
- ENABLE_UI=false
- CHAT_COMPLETION_BASE_URL=None
- API_KEY=<YOUR_OPEN-WEBUI_API_KEY>
- DEFAULT_LANGUAGE=en
- COMPUTE_TYPE__USE_BATCHED_MODE=false
- OPENAI_BASE_URL=http://open-webui:8080/v1
- OPENAI_API_KEY=<YOUR_OPEN-WEBUI_API_KEY>
- WHISPER__COMPUTE_TYPE=int8
- WHISPER__MODEL=Systran/faster-distil-whisper-small.en
- TRANSCRIPTION_BASE_URL=http://speaches:8000/v1/audio/transcriptions
- TRANSCRIPTION_API_KEY=<YOUR_OPEN-WEBUI_API_KEY>
- SPEECH_EXTRA_BODY__SAMPLE_RATE=8000
- SPEECH_BASE_URL=http://speaches:8000/v1/audio/speech
- SPEECH_API_KEY=<YOUR_OPEN-WEBUI_API_KEY>
- SPEECH_MODEL=rhasspy/piper-voices
Docker Run
sudo docker run -d --name=speaches --runtime nvidia --gpus=all --restart=unless-stopped \
-v /home/<YOUR_USERNAME>/.cache/huggingface:/root/.cache/huggingface \
-p 8000:8000 \
--env “HUGGING_FACE_HUB_TOKEN=<YOUR_HUGGINGFACE_TOKEN>”
--env "ENABLE_UI=false" \
--env "CHAT_COMPLETION_BASE_URL=None" \
--env "API_KEY=<YOUR_OPEN-WEBUI_API_KEY>" \
--env "DEFAULT_LANGUAGE=en" \
--env "COMPUTE_TYPE__USE_BATCHED_MODE=false" \
--env "OPENAI_BASE_URL=http://<OPEN-WEBUI-IP/NAME>:8080/v1" \
--env "OPENAI_API_KEY=<YOUR_OPEN-WEBUI_API_KEY>" \
--env "WHISPER__COMPUTE_TYPE=int8" \
--env "WHISPER__MODEL=Systran/faster-distil-whisper-small.en" \
--env "TRANSCRIPTION_BASE_URL=http://<SPEACHES-IP/NAME>:8000/v1/audio/transcriptions" \
--env "TRANSCRIPTION_API_KEY=<YOUR_OPEN-WEBUI_API_KEY>" \
--env "SPEECH_EXTRA_BODY__SAMPLE_RATE=8000" \
--env "SPEECH_BASE_URL=http://<SPEACHES-IP/NAME>:8000/v1/audio/speech" \
--env "SPEECH_API_KEY=<YOUR_OPEN-WEBUI_API_KEY>" \
--env "SPEECH_MODEL=rhasspy/piper-voices" \
ghcr.io/speaches-ai/speaches:latest-cuda
Post-deployment
In Open-WebUI, open ‘Admin Panel:Settings:Audio’.
^ This is great and already helped me infinitely in getting my STT+TTS set up. I think OWUI could probably do with some clarity on their end, to indicate to users that the "OpenAI" option isn't limited to OpenAI URLs only, but in the mean time this does a great job of walking through the process.
Highly recommend this goes up on the docs site ASAP for new users looking for answers, as this thread took me a while to find.
Yes, sorry guys. I have made some progress on these issues. One was to specify the user as root so the huggingface cache dir permissions didn't clash with vllm (could just specify another dir for the volume), and another was to change the entrypoint to download the model via huggingface cli && orig entrypoint command. But I'm on a business trip in Germany and just have too much stuff piled up for the moment. After trip ends in early June and I catch up some stuff at home is earliest I can polish it off.
Like I said, if anyone can finish it off earlier feel free to run with it. Honestly not a docker wizard myself. I actually learned a lot setting up a docker compose app w/openweb-ui, vllm for generate and embedding, comfy-ui compiled by source for image generation, and caddy. So some of these problems might be trivial to the right people.
Eventually I want to write an entire Open-WebUI guide but I have way too much stuff on my plate right now.
This isn't working for me... I get strange 404 errors when Open WebUI tries to hit it. Also, my WebUI doesn't honor the Env Vars, still have to manually configure from the UI.
Open WebUI - v0.6.15
INFO: 172.27.0.3:36628 - "POST /v1/audio/speech HTTP/1.1" 404 Not Found INFO: 172.27.0.3:43244 - "POST /v1/audio/speech HTTP/1.1" 404 Not Found INFO: 172.27.0.3:43248 - "POST /v1/audio/speech HTTP/1.1" 404 Not Found INFO: 172.27.0.3:43256 - "POST /v1/audio/speech HTTP/1.1" 404 Not Found INFO: 172.27.0.3:43264 - "POST /v1/audio/speech HTTP/1.1" 404 Not Found INFO: 172.27.0.3:43266 - "POST /v1/audio/speech HTTP/1.1" 404 Not Found INFO: 172.27.0.3:43280 - "POST /v1/audio/speech HTTP/1.1" 404 Not Found INFO: 172.27.0.3:43292 - "POST /v1/audio/speech HTTP/1.1" 404 Not Found INFO: 172.27.0.3:43306 - "POST /v1/audio/speech HTTP/1.1" 404 Not Found INFO: 172.27.0.3:43308 - "POST /v1/audio/speech HTTP/1.1" 404 Not Found
Click to see original message.
@dgshue Open-WebUI environmental variables that are persistent will only be recognized on a fresh install. If on initial load they are not set, they'll be set to a default and changing them in the future will not make a difference.
I'd recommend using names instead of IP's. Should theoretically work if you specify the IP but better to use names as in the example.
If it would help, here is my current compose.yaml:
services:
caddy:
...
configs:
- target: /etc/caddy/Caddyfile
source: Caddyfile
open-webui:
...
vllm-generate:
...
vllm-embed:
...
speaches:
restart: unless-stopped
image: "ghcr.io/speaches-ai/speaches:latest-cuda"
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
# count: 1
device_ids: ['0']
capabilities: [gpu]
user: ${CONTAINER_USER}
volumes:
- ${HUGGINGFACE_PATH}:/home/ubuntu/.cache/huggingface
- ${DOWNLOADED_MODELS_PATH}:/data/models
healthcheck:
test: ["CMD-SHELL", "curl -f http://speaches:8000/health -H 'Authorization: Bearer ${INTERNAL_TOKEN}'"]
interval: 20s
timeout: 5s
retries: 60
environment:
- NVIDIA_VISIBLE_DEVICES=0
- HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN}
- API_KEY=${INTERNAL_TOKEN}
- ENABLE_UI=false
- CHAT_COMPLETION_BASE_URL=http://vllm-generate:8000/v1
- CHAT_COMPLETION_API_KEY=${INTERNAL_TOKEN}
- DEFAULT_LANGUAGE=en
- COMPUTE_TYPE__USE_BATCHED_MODE=false
- WHISPER__COMPUTE_TYPE=int8
- WHISPER__MODEL=Systran/faster-distil-whisper-small.en
- TRANSCRIPTION_BASE_URL=http://speaches:8000/v1/audio/transcriptions
- TRANSCRIPTION_API_KEY=${INTERNAL_TOKEN}
- SPEECH_EXTRA_BODY__SAMPLE_RATE=8000
- SPEECH_BASE_URL=http://speaches:8000/v1/audio/speech
- SPEECH_API_KEY=${INTERNAL_TOKEN}
- SPEECH_MODEL=rhasspy/piper-voices
command: /bin/bash -c "huggingface-cli download rhasspy/piper-voices --include 'en/**/*' 'voices.json' && /opt/nvidia/nvidia_entrypoint.sh uvicorn --factory speaches.main:create_app"
comfyui:
...
configs:
Caddyfile:
name: Caddyfile
content: |
https://${CADDY_NAME_IP}:${OPEN_WEBUI_PORT} {
tls internal
reverse_proxy open-webui:8080
}
https://${CADDY_NAME_IP}:${VLLM_PORT} {
tls internal
reverse_proxy vllm-generate:8000
}
https://${CADDY_NAME_IP}:${COMFYUI_PORT} {
tls internal
reverse_proxy comfyui:8188
}
Here is my .env file:
# GENERAL
CONTAINER_USER="root"
CADDY_NAME_IP="localhost"
INTERNAL_TOKEN="<YOUR_INTERNAL_TOKEN_HERE>"
HUGGING_FACE_HUB_TOKEN="<YOUR_HF_TOKEN_HERE>"
# PORTS
OPEN_WEBUI_PORT="8080"
VLLM_PORT="8000"
COMFYUI_PORT="8188"
# PATHS
CADDY_PATH="/data/caddy"
OPEN_WEBUI_PATH="/data/open-webui-volumes"
COMFYUI_PATH="/data/comfyui"
HUGGINGFACE_PATH="/data/huggingface"
DOWNLOADED_MODELS_PATH="/data/models"
Edit: Cleaning up thread.
I have the same problem of @dgshue on Open WebUI logs:
open_webui.routers.audio:get_available_voices:1015 - Error fetching voices from custom endpoint: 404 Client Error: Not Found for url: http://speaches/v1/audio/voices
open_webui.routers.audio:get_available_models:968 - Error fetching models from custom endpoint: 404 Client Error: Not Found for url: http://speaches/v1/audio/models
I think Open WebUI look by default to /audio/voices & /audio/models path and speaches exposes only /voices & /models API. Are you agree? How to add /audio/ to the speaches API?
I have the same problem of @dgshue on Open WebUI logs:
open_webui.routers.audio:get_available_voices:1015 - Error fetching voices from custom endpoint: 404 Client Error: Not Found for url: http://speaches/v1/audio/voicesopen_webui.routers.audio:get_available_models:968 - Error fetching models from custom endpoint: 404 Client Error: Not Found for url: http://speaches/v1/audio/modelsI think Open WebUI look by default to /audio/voices & /audio/models path and speaches exposes only /voices & /models API. Are you agree? How to add /audio/ to the speaches API?
@lpb313 this is the same conclusion I came to before finding this issue, while I really am interested in trying @jrichey98's solution that looks good btw, I may have some helpful info on this matter. First to get the right endpoints for open-webui you'd have to modify the speaches code base yourself and add those endpoints and optionally submit a pull request with your changes, but the interesting thing is that even I get the same error responses while looking at my Open-WebUI and speaches containers logs, voice chat actually works on my end. It's just really annoying that the Open-WebUI cannot be used to select the right models/voice on the fly that easy. The speaches docs have the API page where you can see which GET/POST methods are actually available and the ones Open-WebUI are currently not.
I believe the reason voice chat works for me is because I'm using av/harbor which does a quite good job at combining various services related to local-llm projects, do go have a look.
The gist of this is I believe that harbor has a workaround by using a this bit for the open-webui compose file when ran together with speaches:
av/harbor/open-webui/configs/config.speaches.json:
{
"audio": {
"tts": {
"openai": {
"api_base_url": "http://speaches:8000/v1",
"api_key": "sk-speaches"
},
"engine": "openai",
"model": "${HARBOR_SPEACHES_TTS_MODEL}",
"voice": "${HARBOR_SPEACHES_TTS_VOICE}"
},
"stt": {
"openai": {
"api_base_url": "http://speaches:8000/v1",
"api_key": "sk-speaches"
},
"engine": "openai",
"model": "${HARBOR_SPEACHES_STT_MODEL}"
}
}
}
There's more to it, and I've had to fiddle with speaches a lot but I do recommend trying out the harbor project just understand that this file in particular is 6 months old and there's bits and pieces in the very large project that isn't updated. Importantly the kokoro-utils volume bind in the speaches compose file is no longer necessary and should be commented out.
The speaches docs have the API page where you can see which GET/POST methods are actually available and the ones Open-WebUI are currently not.
You'll find them under $SPEACHES_BASE_URL/docs (e.g. localhost:8000)
First to get the right endpoints for open-webui you'd have to modify the speaches code base yourself and add those endpoints and optionally submit a pull request with your changes
Today, I will create the endpoints to make integration with Open WebUI
Details
The speaches docs have the API page where you can see which GET/POST methods are actually available and the ones Open-WebUI are currently not.
You'll find them under
$SPEACHES_BASE_URL/docs(e.g. localhost:8000)First to get the right endpoints for open-webui you'd have to modify the speaches code base yourself and add those endpoints and optionally submit a pull request with your changes
Today, I will create the endpoints to make integration with Open WebUI
@fedirz when this is done will the model_aliases no longer be needed?
I followed that one a bit, and realized why Open-WebUI was suggesting tts tts-1 and whisper. But I also think I found more references to those aliases and they might do more than simply provide those choice for webui like eg., logic for supported models but I did not complete this digging, so please disregard my question above if not relevant for this issue.
Also thanks a whole bunch for your work and for sharing it.
Click to see original message.
@fedirz @realnikolaj @dgshue @lpb313
- A couple of month and a half long business trips to Japan & Germany: Complete
- Switch from vCenter/ESXi to Apache Cloudstack/KVM in my home lab: Good enough for now.
- More experience with docker / containers under (we still use almost all VM's at work): Under the belt.
- Didn't get to sleep till 4:30am last night so called into work: Check
I have have solved most of the issues I had initially, so I'm going to work on an integration guide today. Hopefully have something out by the end of the day.
Edit: Cleaning up thread.
Status:
- DRAFT 07-08-2025 - Speaches v 0.7, needs updating to work with 0.8 ---
- Planned 0.2: Add docker run commands. ---
- Planned 0.3: Update to work with 0.8 ---
- Planned 0.4: Testing / Feedback prior to Release
- Planned 1.0: Initial Release
Open-WebUI Integration Guide with vLLM, Speaches, and ComfyUI
Revision / Date: 0.1.0-07-08-2025 Purpose: Setup and/or integrate Open-WebUI with external Text and Image Generation, as well as Text-to-Speech (TTS) and Speech to Text (STT). Justications: Local deployment may be may be preferred or mandated due to concerns over data privacy, proprietary information, and regulatory requirements. Open-WebUI performs better on a CPU-only image, but its built-in tools benefit from CUDA. To optimize performance and resource utilization, it's better to run these tools externally. This allows for separate resources allocation to each tool, preventing resource contention and enabling more efficient processing. Scope: Web User Interface (Web-UI), Text Generation via Large Language Models (LLM), Image Generation, Text-to-Speech (TTS), Speech to Text (STT).
TL;DR
- Place the files (compose.yaml, dockerfile-comfyui, .env) in a directory
- Update the hugging face token in the .env file
- Run: docker compose up -d
- Browse to: https://localhost:8080
Tools and Versions:
- Web-UI: Open-WebUI (ghcr.io/open-webui/open-webui:v0.6.15)
- TTS/STT: Speaches.ai (ghcr.io/speaches-ai/speaches:0.7-cuda)
- Text Generation/Auto-completion: VLLM (vllm/vllm-openai:v0.9.2)
- Image Generation: ComfyUI (v0.3.44)
- TLS/HTTPS Encryption: Caddy (caddy:2.10-alpine).
The only way to guarantee functionality is to pin versions in the guide. It's also common in production to test new versions prior to upgrade, rather than just rolling in updates. The following tools are functional with the 'latest' tag as of revision date: Open-WebUI, vLLM, Caddy, ComfyUI. The guide needs to be updated for Speaches v8.0.
Honorable Mentions: LiteLLM
Tool Choice:
- Open-WebUI is a comprehensive UI that is designed to allow a person or organization locally run LLMs & Image Generators locally.
- Speaches (formerly faster-whisper-server) is the current form of the TTS/STT engine that Open-WebUI uses internally.
- vLLM was designed to serve concurrent requests, and is the base for most other projects which do the same. Most organizations require multi-user concurrency for their websites/applications.
- ComfyUI was based primarily on familiarity by the author.
- Caddy is a simple to configure reverse proxy. TLS is required by Open-WebUI for STT & TTS to work, and therefore a solution is required as part of this guide. Most companies will have their own PKI environment to replace this.
Notes:
- Open-WebUI stores its configuration in a database. Environmental variables are used to set values in database, only on initial load during database creation. Prior Open-WebUI instances with a configuration database, must be configured within the Web-UI.
- Open-WebUI will not transmit STT / TTS over an unsecure link
- Multiple applications use the huggingface cache directory, and all must have read/write permissions to it.
- In a production environment docker will need to be secured by running rootless (https://docs.docker.com/engine/security/rootless/), or by employing root remapping (https://docs.docker.com/engine/security/userns-remap/).
- Many will be too resource constrained to run multiple large models in VRAM, and will be best served selecting the best fit of each type (text, audio, image). LiteLLM allows access to multiple models from a single endpoint, but is outside the immediate scope of this guide (https://litellm.vercel.app/, https://github.com/BerriAI/litellm).
Deployment Methods Covered
- Open-WebUI UI configuration
- Docker Compose (Preferred to Docker)
- Docker
- Kubernettes (planned)
Configuration
Speaches.ai
In Open-WebUI, open ‘Admin Panel:Settings:Audio’
Settings:
- Speech-to-Text Engine & Text-to-Speech: Both need to be set to OpenAI.
- API Base URL: Both need to be set to name or ip of the speaches.ai server. Name is preferable but IP also works.
- API Token: API Token key configured on speaches.ai server.
- STT Model: Systran/faster-distil-whisper-small.en
- TTS Voice: Can be set to any of the models listed: https://huggingface.co/rhasspy/piper-voices
-
TTS Model: Should be set to: rhasspy/piper-voices
Make sure to download the rhasspy/piper-voices into the speaches container by running: docker exec -it
"huggingface-cli download rhasspy/piper-voices --include 'en/**/ ' 'voices.json'"
ComfyUI
--- placeholder ---
Deployment
Docker Compose
Docker Compose is a way to create and deploy single or multi-container applications. Files are placed in a directory, then the application is managed with the following commands:
- Start: docker compose up -d
- Stop: docker compose stop
- Destroy: docker compose down (usually done after an updated image has been pulled)
Files:
- compose.yaml: This file describes the structure of the application. (reference)
- dockerfile-comfyui: This is known as a docker file, and is a build script to create a container. (reference)
- .env: This is an environment file. Variables used in the dockerfile during container buildout, as well as the compose.yaml during the provisioning, are stored here.
Click here to see file: .env
# GENERAL
WEBUI_NAME="Open WebUI"
WEBUI_URL="https://ai.yourdomain.com/"
INTERNAL_TOKEN="token-abc123"
HUGGING_FACE_HUB_TOKEN="<YOUR_HUGGINGFACE_TOKEN>"
# CADDY
CADDY_NAME_IP="localhost"
# PORTS
OPEN_WEBUI_PORT="8080"
VLLM_PORT="8000"
SPEACHES_PORT="8081"
COMFYUI_PORT="8082"
# GPU to use
# Example: For all gpus set to "all"; for first GPU only set to "0"; for second and third use "1,2".
GPU_VLLM_TEXT="all"
GPU_VLLM_EMBED="all"
GPU_SPEACHES="all"
GPU_COMFYUI="all"
# PATHS
PATH_OPEN_WEBUI="/data/open-webui"
PATH_COMFYUI="/data/comfyui"
PATH_HUGGINGFACE="/data/huggingface"
PATH_DOWNLOADED_MODELS="/data/models"
CADDY_PATH="/data/caddy"
# VERSIONS
VER_OPEN_WEBUI="v0.6.15"
VER_VLLM="v0.9.2"
VER_SPEACHES="0.7-cuda"
VER_COMFYUI="v0.3.44"
VER_CADDY="2.10-alpine"
Make sure specify a valid hugging face token, as well as update any other settings such as name/url/ports that you wish.
Click here to see file: compose.yaml
services:
# Reverse Proxy applying TLS / HTTPS (Required for TTS & STT with Open-WebUI).
# Can remove if replacing PKI with existing infrastructure.
caddy:
restart: unless-stopped
image: caddy:${VER_CADDY}
cap_add:
- NET_ADMIN
ports:
- "${OPEN_WEBUI_PORT}:${OPEN_WEBUI_PORT}"
- "${OPEN_WEBUI_PORT}:${OPEN_WEBUI_PORT}/udp"
- "${VLLM_PORT}:${VLLM_PORT}"
- "${VLLM_PORT}:${VLLM_PORT}/udp"
- "${SPEACHES_PORT}:${SPEACHES_PORT}"
- "${SPEACHES_PORT}:${SPEACHES_PORT}/udp"
- "${COMFYUI_PORT}:${COMFYUI_PORT}"
- "${COMFYUI_PORT}:${COMFYUI_PORT}/udp"
volumes:
- ${CADDY_PATH}/data:/data
- ${CADDY_PATH}/config:/config
healthcheck:
test: ["CMD-SHELL", "curl -f http://open-webui:8080/health"]
interval: 10s
timeout: 3s
retries: 60
depends_on:
open-webui:
condition: service_healthy
restart: false
vllm-generate:
condition: service_healthy
restart: false
comfyui:
condition: service_healthy
restart: false
environment:
- CADDY_NAME_IP=${CADDY_NAME_IP}
- OPEN_WEBUI_PORT=${OPEN_WEBUI_PORT}
- VLLM_PORT=${VLLM_PORT}
- COMFYUI_PORT=${COMFYUI_PORT}
configs:
- target: /etc/caddy/Caddyfile
source: Caddyfile
# Open-WebUI - UI Frontend for Text / Image Generation / TTS&STT / RAG.
open-webui:
restart: unless-stopped
image: "ghcr.io/open-webui/open-webui:${VER_OPEN_WEBUI}"
user: root
volumes:
- ${PATH_OPEN_WEBUI}:/app/backend/data
healthcheck:
test: ["CMD-SHELL", "curl -f http://open-webui:8080/health -H 'Authorization: Bearer ${INTERNAL_TOKEN}'"]
interval: 10s
timeout: 5s
retries: 60
depends_on:
vllm-generate:
condition: service_healthy
restart: false
vllm-embed:
condition: service_healthy
restart: false
speaches:
condition: service_healthy
restart: false
comfyui:
condition: service_healthy
restart: false
environment:
# Environmental Variables: https://docs.openwebui.com/getting-started/env-configuration/
# General
- WEBUI_NAME=${WEBUI_NAME}
- WEBUI_URL=${WEBUI_URL}
- PORT=8080
- DEFAULT_LOCALE=en
- ENABLE_SIGNUP=true
- DEFAULT_USER_ROLE=user
# API
- ENABLE_OLLAMA_API=false
- ENABLE_OPENAI_API=true
- OPENAI_API_BASE_URL=http://vllm-generate:8000/v1
- OPENAI_API_KEY=${INTERNAL_TOKEN}
# MISC
- ENABLE_AUTOCOMPLETE_GENERATION=true
- AUTOCOMPLETE_GENERATION_INPUT_MAX_LENGTH=-1
- ENABLE_EVALUATION_ARENA_MODELS=true
- ENABLE_COMMUNITY_SHARING=true
- ENABLE_TAGS_GENERATION=true
# AUTH
# SECURITY
- ENABLE_FORWARD_USER_INFO_HEADERS=false
- ENABLE_RAG_LOCAL_WEB_FETCH=true
- WEBUI_AUTH=true
- OFFLINE_MODE=true
- SAFE_MODE=false
- RAG_EMBEDDING_MODEL_TRUST_REMOTE_CODE=true
- RAG_RERANKING_MODEL_TRUST_REMOTE_CODE=true
- RAG_EMBEDDING_MODEL_AUTO_UPDATE=true
- RAG_RERANKING_MODEL_AUTO_UPDATE=true
- WHISPER_MODEL_AUTO_UPDATE=false
# RAG Documents
- RAG_EMBEDDING_ENGINE=openai
- ENABLE_RAG_HYBRID_SEARCH=false
- RAG_EMBEDDING_MODEL=ibm-granite/granite-embedding-125m-english
- RAG_TOP_K=12
- RAG_RELEVANCE_THRESHOLD=0.1
- RAG_TEXT_SPLITTER=character
- CHUNK_SIZE=1000
- CHUNK_OVERLAP=100
- PDF_EXTRACT_IMAGES=false
- RAG_OPENAI_API_BASE_URL=http://vllm-embed:8000/v1
- RAG_OPENAI_API_KEY=${INTERNAL_TOKEN}
- RAG_EMBEDDING_OPENAI_BATCH_SIZE=32
# RAG Web
- ENABLE_RAG_WEB_SEARCH=true
- RAG_WEB_SEARCH_ENGINE=duckduckgo
# Audio
- AUDIO_STT_ENGINE=openai
- AUDIO_STT_MODEL=Systran/faster-distil-whisper-small.en
- AUDIO_STT_OPENAI_API_BASE_URL=http://speaches:8000/v1
- AUDIO_STT_OPENAI_API_KEY=${INTERNAL_TOKEN}
- AUDIO_TTS_ENGINE=openai
- AUDIO_TTS_MODEL=rhasspy/piper-voices
- AUDIO_TTS_OPENAI_API_BASE_URL=http://speaches:8000/v1
- AUDIO_TTS_OPENAI_API_KEY=${INTERNAL_TOKEN}
- AUDIO_TTS_VOICE=en_US-amy-medium
# Image Generation
- ENABLE_IMAGE_GENERATION=true
- IMAGE_GENERATION_ENGINE=comfyui
- COMFYUI_BASE_URL=http://comfyui:8188
- IMAGE_SIZE=720x480
# AUTH
- ENABLE_OAUTH_SIGNUP=false
- ENABLE_LDAP=false
# Permissions - Set as desired
- USER_PERMISSIONS_WORKSPACE_MODELS_ACCESS=false
- USER_PERMISSIONS_WORKSPACE_KNOWLEDGE_ACCESS=true
- USER_PERMISSIONS_WORKSPACE_PROMPTS_ACCESS=true
- USER_PERMISSIONS_WORKSPACE_TOOLS_ACCESS=false
- USER_PERMISSIONS_CHAT_FILE_UPLOAD=true
# vLLM Instance for Text Generation of Large Language Models
vllm-generate:
ipc: host
restart: unless-stopped
image: "vllm/vllm-openai:${VER_VLLM}"
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
# count: 1
device_ids: ["${GPU_VLLM_EMBED}"]
capabilities: [gpu]
user: root
volumes:
- ${PATH_HUGGINGFACE}:/root/.cache/huggingface
- ${PATH_DOWNLOADED_MODELS}:/models
healthcheck:
test: ["CMD-SHELL", "curl -f http://vllm-generate:8000/health -H 'Authorization: Bearer ${INTERNAL_TOKEN}'"]
interval: 10s
timeout: 5s
retries: 60
environment:
- NVIDIA_VISIBLE_DEVICES=${GPU_VLLM_EMBED}
- HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN}
- VLLM_ALLOW_LONG_MAX_MODEL_LEN=1
# For best compatibility: VLLM_ATTENTION_BACKEND=XFORMERS VLLM_USE_V1=0
# For best performance: VLLM_USE_V1=1
# - VLLM_ATTENTION_BACKEND=XFORMERS
# - VLLM_USE_V1=0
command:
# If using local model reference local filepath (ex: /models/Meta-Llama-3.1-8B-Instruct).
# (Huggingface example: RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a8)
# (Local model example: /models/meta-llama-3.1-8b-instruct-quantized.w8a8)
- --model
- RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a8
- --task
- generate
- --api-key
- ${INTERNAL_TOKEN}
# Context Size: Larger contexts use more ram but can handle longer conversations.
- --max-model-len
- "8192"
# KV cache can be set by either specifying (gpu blocks & block size), or (gpu memory %)
# KV cache = (block-size * num-gpu-blocks)
# KV cash must >= max-model-len (sequence length), often slightly greater depending on model/engine.
- --block-size
- "16"
- --num-gpu-blocks-override
- "640"
# GPU memory %
# --gpu-memory-utilization
# - ".95"
# vLLM Instance for embedding Models
vllm-embed:
ipc: host
restart: unless-stopped
image: "vllm/vllm-openai:${VER_VLLM}"
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
# count: 1
device_ids: ["${GPU_VLLM_EMBED}"]
capabilities: [gpu]
user: root
volumes:
- ${PATH_HUGGINGFACE}:/root/.cache/huggingface
- ${PATH_DOWNLOADED_MODELS}:/models
healthcheck:
test: ["CMD-SHELL", "curl -f http://vllm-embed:8000/health -H 'Authorization: Bearer ${INTERNAL_TOKEN}'"]
interval: 10s
timeout: 5s
retries: 30
environment:
- NVIDIA_VISIBLE_DEVICES=${GPU_VLLM_EMBED}
- HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN}
- VLLM_ATTENTION_BACKEND=XFORMERS
- VLLM_USE_V1=0
command:
- --model
- ibm-granite/granite-embedding-125m-english
- --task
- embed
- --api-key
- ${INTERNAL_TOKEN}
- --max-seq-len-to-capture
- "512"
- --max-num-seqs
- "32"
- --block-size
- "16"
- --num-gpu-blocks-override
- "32"
# Speaches.ai for Text to Speach & Speach to Text
speaches:
restart: unless-stopped
image: "ghcr.io/speaches-ai/speaches:${VER_SPEACHES}"
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
# count: 1
device_ids: ["${GPU_SPEACHES}"]
capabilities: [gpu]
user: root
volumes:
- ${PATH_HUGGINGFACE}:/home/ubuntu/.cache/huggingface
- ${PATH_DOWNLOADED_MODELS}:/models
healthcheck:
test: ["CMD-SHELL", "curl -f http://speaches:8000/health -H 'Authorization: Bearer ${INTERNAL_TOKEN}'"]
interval: 10s
timeout: 5s
retries: 60
environment:
- NVIDIA_VISIBLE_DEVICES=${GPU_SPEACHES}
- HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN}
- API_KEY=${INTERNAL_TOKEN}
- ENABLE_UI=false
- CHAT_COMPLETION_BASE_URL=http://vllm-generate:8000/v1
- CHAT_COMPLETION_API_KEY=${INTERNAL_TOKEN}
- DEFAULT_LANGUAGE=en
- COMPUTE_TYPE__USE_BATCHED_MODE=false
- WHISPER__COMPUTE_TYPE=int8
- WHISPER__MODEL=Systran/faster-distil-whisper-small.en
- TRANSCRIPTION_BASE_URL=http://speaches:8000/v1/audio/transcriptions
- TRANSCRIPTION_API_KEY=${INTERNAL_TOKEN}
- SPEECH_EXTRA_BODY__SAMPLE_RATE=22050
- SPEECH_BASE_URL=http://speaches:8000/v1/audio/speech
- SPEECH_API_KEY=${INTERNAL_TOKEN}
- SPEECH_MODEL=rhasspy/piper-voices
command: /bin/bash -c "huggingface-cli download rhasspy/piper-voices --include 'en/**/*' 'voices.json' && /opt/nvidia/nvidia_entrypoint.sh uvicorn --factory speaches.main:create_app"
# ComfyUI for Image Generation
comfyui:
restart: unless-stopped
build:
context: .
dockerfile: dockerfile-comfyui
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
# count: 1
device_ids: ["${GPU_COMFYUI}"]
capabilities: [gpu]
volumes:
- ${PATH_COMFYUI}/models:/opt/ComfyUI/models
- ${PATH_COMFYUI}/output:/opt/ComfyUI/output
healthcheck:
test: ["CMD-SHELL", "curl -f http://comfyui:8188"]
interval: 10s
timeout: 5s
retries: 30
environment:
- NVIDIA_VISIBLE_DEVICES=${GPU_COMFYUI}
- HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN}
command:
- --gpu-only
# Configuration for Caddy
# Can remove if PKI is not needed
configs:
Caddyfile:
name: Caddyfile
content: |
https://${CADDY_NAME_IP}:${OPEN_WEBUI_PORT} {
tls internal
reverse_proxy open-webui:8080
}
https://${CADDY_NAME_IP}:${VLLM_PORT} {
tls internal
reverse_proxy vllm-generate:8000
}
https://${CADDY_NAME_IP}:${SPEACHES_PORT} {
tls internal
reverse_proxy vllm-generate:8000
}
https://${CADDY_NAME_IP}:${COMFYUI_PORT} {
tls internal
reverse_proxy comfyui:8188
}
Click here to see file: dockerfile-comfyui
FROM ubuntu:22.04
EXPOSE 8188
WORKDIR /opt/ComfyUI
ENV DEBIAN_FRONTEND=noninteractive
ENV TZ=Etc/UTC
ENV PIP_BREAK_SYSTEM_PACKAGES=1
RUN apt update && apt install tzdata curl git git-lfs python3 python-is-python3 python3-pip -y --no-install-recommends
RUN git clone https://github.com/comfyanonymous/ComfyUI --branch ${VER_COMFYUI} /opt/ComfyUI
RUN pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu126
RUN pip install -r requirements.txt
ENTRYPOINT ["python", "main.py", "--listen", "0.0.0.0"]
Docker
--- placeholder ---
Just pushed out v0.8.2 release, which includes OpenWebUI-compatible voices and model routes. Thanks, everyone, for the community effort here!
@jrichey98 It's a bit late for me already, so I won't be able to review what you've written today, but I'll make sure to get back to you.
Again, big thanks to everyone!
Copy. I can do a rm -rf /data/* and run the compose files above and everything except ComfyUI is configured and working out of the box (including Speaches). But it only works with Speaches 0.7, something changed in 0.8 so now I need to figure out why the newer version isn't working.
I have caught up a bit for now and have time to dedicate. I'll keep working / refining throughout the week and weekend. I'm starting with the full integration guide because I actually helped get something very similar to the above stack deployed and they are really happy with it, so it was actually easier to polish up what I already had which was an integration guide for a full application.
If a stripped down guide is needed then it can be taken from the larger one. But there is value I think in giving someone a working full stack solution, that they can use as a starting point and modify to their preference.
I found that when using TTS in the Speaches UI directly, everything functions properly:
2025-10-11 19:50:55,354:INFO:httpx:_send_single_request:1740:HTTP Request: POST https://tts.domain.com/v1/audio/speech "HTTP/1.1 200 OK"
Using TTS from OpenWebUI fails, due to not having a base URL:
INFO: 10.0.0.20:42300 - "POST /v1/audio/speech HTTP/1.1" 404 Not Found
I have this environment variable set:
LOOPBACK_HOST_URL: https://tts.domain.com
I also tried setting these, but they seemingly did nothing:
SPEACHES_BASE_URL
SPEECH_BASE_URL
What am I missing here?
@TheDarkula see maybe #475
Try without "s" in https, unless you know you got certificates sorted out already. Then try and add env ALLOW_ORIGINS,
LOOPBACK_HOST_URL http://<host>:<port>
ALLOW_ORIGINS ["*"]
Don't actually use "*" if you can, like within a tailnet, otherwise just allow subnets only.
@realnikolaj I tried using a kubernetes service DNS hostname as well, but Open-WebUI still hits the same issue.
Could this be related to the Open-WebUI issue here, or this speaches issue?
Did you try the older WebUI version?
I've pretty much ironed the bugs out of my stack. If you want to see a full Caddy / Open-WebUI / LiteLLM / vLLM / Speaches.ai / ComfyUI stack, the files are attached:
- Extract files to a folder.
- Replace the huggingface token in the .env with your own.
- Optional: Replace certs with your own (current are from a private ca for localhost and my site).
- Do a docker compose up -d.
- If changing the context size in the .env file, make sure the # of blocks are larger than context / 16 (a little padding may be required, the more the better, used to fix VRAM usage).
Edit: Working with latest versions as of (10-12-2025). Will work on finishing up the guide soon. Eventually will be porting to Kubernetes.
@realnikolaj I tried open-webui version 0.6.28, and speaches exhibits the same behaviour of stripping the host on external calls.
@jrichey98 Your configuration is essentially the same as mine, but speaches is removing the hostname coming from open-webui, which seems to be causing the failure.
@jrichey98 Your configuration is essentially the same as mine, but speaches is removing the hostname coming from open-webui, which seems to be causing the failure.
Just noticed this as well. Previously things were working but I now get the same issue:
2025-11-05T01:35:38.097638216Z INFO: 172.18.0.1:41624 - "POST /v1/audio/speech HTTP/1.1" 404 Not Found
2025-11-05T01:35:38.241839206Z INFO: 172.18.0.1:41628 - "POST /v1/audio/speech HTTP/1.1" 404 Not Found
2025-11-05T01:35:38.381682658Z INFO: 172.18.0.1:41636 - "POST /v1/audio/speech HTTP/1.1" 404 Not Found
2025-11-05T01:35:38.523681703Z INFO: 172.18.0.1:41642 - "POST /v1/audio/speech HTTP/1.1" 404 Not Found
Just wanted to note again that the exact same config previously worked without issue.
EDIT:
After seeing #475 and adding:
LOOPBACK_HOST_URL=http://IP:8000
ALLOW_ORIGINS=["*"]
Everything is working again.