Fish Speech API Webserver in Docker

OpenAPI-like voice generation server based on fish-speech-1.5.

Supports text-to-speech and voice style transfer via reference audio samples.

Requirements

Nvidia GPU
For Docker-way
- Nvidia Docker Runtime
- Docker
- Docker Compose
For Manual Setup
- Python 3.12
- Python Venv

🔧 Quick Start

Clone the repo first:

git clone --recurse-submodules [email protected]:EvilFreelancer/fish-speech-api.git
cd docker-fish-speech-server

Docker-way

cp docker-compose.dist.yml docker-compose.yml
docker compose up -d

Enter the container:

docker compose exec api bash

Download the model:

huggingface-cli download fishaudio/fish-speech-1.5 --local-dir models/fish-speech-1.5/

Manual Setup

apt install cmake portaudio19-dev

Set up a virtual environment and install dependencies:

python3.12 -m venv venv
pip install -r requirements.txt

Download model:

huggingface-cli download fishaudio/fish-speech-1.5 --local-dir models/fish-speech-1.5/

Run API-server:

python main.py

🧪 Testing the API

Generate speech with default voice

curl http://localhost:8000/audio/speech \
  -X POST \
  -F model="fish-speech-1.5" \
  -F input="Hello, this is a test of Fish Speech API" \
  --output "speech.wav"

In JSON format:

curl http://localhost:8000/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
      "model": "fish-speech-1.5",
      "input": "Hello, this is a test of Fish Speech API"
  }' \
  --output "speech.wav"

Generate speech with example voice

curl http://gpu02:13000/audio/speech \
  -X POST \
  -F model="fish-speech-1.5" \
  -F voice="english-nice" \
  -F input="Dr. Eleanor Whitaker, a quantum physicist from Edinburgh, surreptitiously analyzed the enigmatic hieroglyphs while humming Für Elise —her quizzical expression mirrored the cryptic symbols perplexing arrangement, yet she remained determined to decipher their archaic secrets." \
  --output "speech.wav"

In JSON format:

curl http://localhost:8000/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
      "model": "fish-speech-1.5",
      "voice": "english-nice",
      "input": "Dr. Eleanor Whitaker, a quantum physicist from Edinburgh, surreptitiously analyzed the enigmatic hieroglyphs while humming Für Elise —her quizzical expression mirrored the cryptic symbols perplexing arrangement, yet she remained determined to decipher their archaic secrets."
  }' \
  --output "speech.wav"

Generate speech with reference voice

curl http://localhost:8000/audio/speech \
  -X POST \
  -H 'Content-Type: multipart/form-data' \
  -F model="fish-speech-1.5" \
  -F input="Dr. Eleanor Whitaker, a quantum physicist from Edinburgh, surreptitiously analyzed the enigmatic hieroglyphs while humming Für Elise —her quizzical expression mirrored the cryptic symbols perplexing arrangement, yet she remained determined to decipher their archaic secrets." \
  -F reference_audio="@voice-viola.wav" \
  --output "speech.wav"

In JSON format:

curl http://localhost:8000/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
      "model": "fish-speech-1.5",
      "input": "Dr. Eleanor Whitaker, a quantum physicist from Edinburgh, surreptitiously analyzed the enigmatic hieroglyphs while humming Für Elise —her quizzical expression mirrored the cryptic symbols perplexing arrangement, yet she remained determined to decipher their archaic secrets.",
      "reference_audio": "=base64..."
  }' \
  --output "speech.wav"

Advanced settings

curl http://localhost:8000/audio/speech \
  -X POST \
  -H 'Content-Type: multipart/form-data' \
  -F model="fish-speech-1.5" \
  -F input="Dr. Eleanor Whitaker, a quantum physicist from Edinburgh, surreptitiously analyzed the enigmatic hieroglyphs while humming Für Elise —her quizzical expression mirrored the cryptic symbols perplexing arrangement, yet she remained determined to decipher their archaic secrets." \
  -F top_p="0.1" \
  -F repetition_penalty="1.3" \
  -F temperature="0.75" \
  -F chunk_length="150" \
  -F max_new_tokens="768" \
  -F seed="42" \
  -F reference_audio="@voice-viola.wav" \
  --output "speech.wav"

In JSON format:

curl http://localhost:8000/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
      "model": "fish-speech-1.5",
      "input": "Dr. Eleanor Whitaker, a quantum physicist from Edinburgh, surreptitiously analyzed the enigmatic hieroglyphs while humming Für Elise —her quizzical expression mirrored the cryptic symbols perplexing arrangement, yet she remained determined to decipher their archaic secrets.",
      "top_p": "0.1",
      "repetition_penalty": "1.3",
      "temperature": "0.75",
      "chunk_length": "150",
      "max_new_tokens": "768",
      "seed": "42",
      "reference_audio": "=base64..."
  }' \
  --output "speech.wav"

docker-fish-speech-server
docker-fish-speech-server copied to clipboard

Metadata

Fish Speech API Webserver in Docker

Requirements

🔧 Quick Start

Docker-way

Manual Setup

🧪 Testing the API

Generate speech with default voice

Generate speech with example voice

Generate speech with reference voice

Advanced settings

Links

← Metadata

Owner

Metadata

docker-fish-speech-server docker-fish-speech-server copied to clipboard

Metadata

Fish Speech API Webserver in Docker

Requirements

🔧 Quick Start

Docker-way

Manual Setup

🧪 Testing the API

Generate speech with default voice

Generate speech with example voice

Generate speech with reference voice

Advanced settings

Links

← Metadata

Owner

Metadata

docker-fish-speech-server
docker-fish-speech-server copied to clipboard