openai-edge-tts icon indicating copy to clipboard operation
openai-edge-tts copied to clipboard

Text-to-speech API endpoint compatible with OpenAI's TTS API endpoint, using Microsoft Edge TTS to generate speech for free locally

OpenAI-Compatible Edge-TTS API ๐Ÿ—ฃ๏ธ

GitHub stars GitHub forks GitHub repo size GitHub language count GitHub top language GitHub last commit Hits

This project provides a local, OpenAI-compatible text-to-speech (TTS) API using edge-tts. It emulates the OpenAI TTS endpoint (/v1/audio/speech), enabling users to generate speech from text with various voice options and playback speeds, just like the OpenAI API.

edge-tts uses Microsoft Edge's online text-to-speech service, so it is completely free.

View this project on Docker Hub

Please โญ๏ธ star this repo if you find it helpful

Features

  • OpenAI-Compatible Endpoint: /v1/audio/speech with similar request structure and behavior.
  • Supported Voices: Maps OpenAI voices (alloy, echo, fable, onyx, nova, shimmer) to edge-tts equivalents.
  • Flexible Formats: Supports multiple audio formats (mp3, opus, aac, flac, wav, pcm).
  • Adjustable Speed: Option to modify playback speed (0.25x to 4.0x).
  • Optional Direct Edge-TTS Voice Selection: Use either OpenAI voice mappings or specify any edge-tts voice directly.

Getting Started

Prerequisites

  • Docker (recommended): Docker and Docker Compose for containerized setup.
  • Python (optional): For local development, install dependencies in requirements.txt.
  • ffmpeg (optional): Required for audio format conversion. Optional if sticking to mp3.

Installation

  1. Clone the Repository:
git clone https://github.com/travisvn/openai-edge-tts.git
cd openai-edge-tts
  1. Environment Variables: Create a .env file in the root directory with the following variables:
API_KEY=your_api_key_here
PORT=5050

DEFAULT_VOICE=en-US-AndrewNeural
DEFAULT_RESPONSE_FORMAT=mp3
DEFAULT_SPEED=1.2

DEFAULT_LANGUAGE=en-US

REQUIRE_API_KEY=True

Or, copy the default .env.example with the following:

cp .env.example .env
  1. Run with Docker Compose (recommended):
docker compose up --build

(Note: docker-compose is not the same as docker compose)

Run with -d to run docker compose in "detached mode", meaning it will run in the background and free up your terminal.

docker compose up -d

Alternatively, run directly with Docker:

docker build -t openai-edge-tts .
docker run -p 5050:5050 --env-file .env openai-edge-tts

To run the container in the background, add -d after the docker run command:

docker run -d -p 5050:5050 --env-file .env openai-edge-tts
  1. Access the API: Your server will be accessible at http://localhost:5050.

Running with Python

If you prefer to run this project directly with Python, follow these steps to set up a virtual environment, install dependencies, and start the server.

1. Clone the Repository

git clone https://github.com/travisvn/openai-edge-tts.git
cd openai-edge-tts

2. Set Up a Virtual Environment

Create and activate a virtual environment to isolate dependencies:

# For macOS/Linux
python3 -m venv venv
source venv/bin/activate

# For Windows
python -m venv venv
venv\Scripts\activate

3. Install Dependencies

Use pip to install the required packages listed in requirements.txt:

pip install -r requirements.txt

4. Configure Environment Variables

Create a .env file in the root directory and set the following variables:

API_KEY=your_api_key_here
PORT=5050

DEFAULT_VOICE=en-US-AndrewNeural
DEFAULT_RESPONSE_FORMAT=mp3
DEFAULT_SPEED=1.2

DEFAULT_LANGUAGE=en-US

REQUIRE_API_KEY=True

5. Run the Server

Once configured, start the server with:

python app/server.py

The server will start running at http://localhost:5050.

6. Test the API

You can now interact with the API at http://localhost:5050/v1/audio/speech and other available endpoints. See the Usage section for request examples.

Usage

Endpoint: /v1/audio/speech

Generates audio from the input text. Available parameters:

Required Parameter:

  • input (string): The text to be converted to audio (up to 4096 characters).

Optional Parameters:

  • model (string): Set to "tts-1" or "tts-1-hd" (default: "tts-1").
  • voice (string): One of the OpenAI-compatible voices (alloy, echo, fable, onyx, nova, shimmer) or any valid edge-tts voice (default: "en-US-AndrewNeural").
  • response_format (string): Audio format. Options: mp3, opus, aac, flac, wav, pcm (default: mp3).
  • speed (number): Playback speed (0.25 to 4.0). Default is 1.2.

Example request with curl and saving the output to an mp3 file:

curl -X POST http://localhost:5050/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your_api_key_here" \
  -d '{
    "input": "Hello, I am your AI assistant! Just let me know how I can help bring your ideas to life.",
    "voice": "echo",
    "response_format": "mp3",
    "speed": 1.2
  }' \
  --output speech.mp3

Or, to be in line with the OpenAI API endpoint parameters:

curl -X POST http://localhost:5050/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your_api_key_here" \
  -d '{
    "model": "tts-1",
    "input": "Hello, I am your AI assistant! Just let me know how I can help bring your ideas to life.",
    "voice": "alloy"
  }' \
  --output speech.mp3

And an example of a language other than English:

curl -X POST http://localhost:5050/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your_api_key_here" \
  -d '{
    "model": "tts-1",
    "input": "ใ˜ใ‚ƒใ‚ใ€่กŒใใ€‚้›ป่ปŠใฎๆ™‚้–“ใ€่ชฟในใฆใŠใใ‚ˆใ€‚",
    "voice": "ja-JP-KeitaNeural"
  }' \
  --output speech.mp3

Additional Endpoints

  • POST/GET /v1/models: Lists available TTS models.
  • POST/GET /v1/voices: Lists edge-tts voices for a given language / locale.
  • POST/GET /v1/voices/all: Lists all edge-tts voices, with language support information.

Contributing

Contributions are welcome! Please fork the repository and create a pull request for any improvements.

License

This project is licensed under GNU General Public License v3.0 (GPL-3.0), and the acceptable use-case is intended to be personal use. For enterprise or non-personal use of openai-edge-tts, contact me at [email protected]


Example Use Case

Open WebUI

Open up the Admin Panel and go to Settings -> Audio

Below, you can see a screenshot of the correct configuration for using this project to substitute the OpenAI endpoint

Screenshot of Open WebUI Admin Settings for Audio adding the correct endpoints for this project

[!NOTE] View the official docs for Open WebUI integration with OpenAI Edge TTS

AnythingLLM

In version 1.6.8, AnythingLLM added support for "generic OpenAI TTS providers" โ€” meaning we can use this project as the TTS provider in AnythingLLM

Open up settings and go to Voice & Speech (Under AI Providers)

Below, you can see a screenshot of the correct configuration for using this project to substitute the OpenAI endpoint

Screenshot of AnythingLLM settings for Voice adding the correct endpoints for this project


Quick Info

  • your_api_key_here never needs to be replaced โ€” No "real" API key is required. Use whichever string you'd like.
  • The quickest way to get this up and running is to install docker and run the command below:
docker run -d -p 5050:5050 -e API_KEY=your_api_key_here -e PORT=5050 travisvn/openai-edge-tts:latest

Voice Samples ๐ŸŽ™๏ธ

Play voice samples and see all available Edge TTS voices