docker-fish-speech-server
docker-fish-speech-server copied to clipboard
OpenAPI-like API-server for voice generation (TTS) based on fish-speech-1.5 model.
Fish Speech API Webserver in Docker
OpenAPI-like voice generation server based on fish-speech-1.5.
Supports text-to-speech and voice style transfer via reference audio samples.
Requirements
- Nvidia GPU
- For Docker-way
- Nvidia Docker Runtime
- Docker
- Docker Compose
- For Manual Setup
- Python 3.12
- Python Venv
🔧 Quick Start
Clone the repo first:
git clone --recurse-submodules [email protected]:EvilFreelancer/fish-speech-api.git
cd docker-fish-speech-server
Docker-way
cp docker-compose.dist.yml docker-compose.yml
docker compose up -d
Enter the container:
docker compose exec api bash
Download the model:
huggingface-cli download fishaudio/fish-speech-1.5 --local-dir models/fish-speech-1.5/
Manual Setup
apt install cmake portaudio19-dev
Set up a virtual environment and install dependencies:
python3.12 -m venv venv
pip install -r requirements.txt
Download model:
huggingface-cli download fishaudio/fish-speech-1.5 --local-dir models/fish-speech-1.5/
Run API-server:
python main.py
🧪 Testing the API
Generate speech with default voice
curl http://localhost:8000/audio/speech \
-X POST \
-F model="fish-speech-1.5" \
-F input="Hello, this is a test of Fish Speech API" \
--output "speech.wav"
In JSON format:
curl http://localhost:8000/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "fish-speech-1.5",
"input": "Hello, this is a test of Fish Speech API"
}' \
--output "speech.wav"
Generate speech with example voice
curl http://gpu02:13000/audio/speech \
-X POST \
-F model="fish-speech-1.5" \
-F voice="english-nice" \
-F input="Dr. Eleanor Whitaker, a quantum physicist from Edinburgh, surreptitiously analyzed the enigmatic hieroglyphs while humming Für Elise —her quizzical expression mirrored the cryptic symbols perplexing arrangement, yet she remained determined to decipher their archaic secrets." \
--output "speech.wav"
In JSON format:
curl http://localhost:8000/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "fish-speech-1.5",
"voice": "english-nice",
"input": "Dr. Eleanor Whitaker, a quantum physicist from Edinburgh, surreptitiously analyzed the enigmatic hieroglyphs while humming Für Elise —her quizzical expression mirrored the cryptic symbols perplexing arrangement, yet she remained determined to decipher their archaic secrets."
}' \
--output "speech.wav"
Generate speech with reference voice
curl http://localhost:8000/audio/speech \
-X POST \
-H 'Content-Type: multipart/form-data' \
-F model="fish-speech-1.5" \
-F input="Dr. Eleanor Whitaker, a quantum physicist from Edinburgh, surreptitiously analyzed the enigmatic hieroglyphs while humming Für Elise —her quizzical expression mirrored the cryptic symbols perplexing arrangement, yet she remained determined to decipher their archaic secrets." \
-F reference_audio="@voice-viola.wav" \
--output "speech.wav"
In JSON format:
curl http://localhost:8000/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "fish-speech-1.5",
"input": "Dr. Eleanor Whitaker, a quantum physicist from Edinburgh, surreptitiously analyzed the enigmatic hieroglyphs while humming Für Elise —her quizzical expression mirrored the cryptic symbols perplexing arrangement, yet she remained determined to decipher their archaic secrets.",
"reference_audio": "=base64..."
}' \
--output "speech.wav"
Advanced settings
curl http://localhost:8000/audio/speech \
-X POST \
-H 'Content-Type: multipart/form-data' \
-F model="fish-speech-1.5" \
-F input="Dr. Eleanor Whitaker, a quantum physicist from Edinburgh, surreptitiously analyzed the enigmatic hieroglyphs while humming Für Elise —her quizzical expression mirrored the cryptic symbols perplexing arrangement, yet she remained determined to decipher their archaic secrets." \
-F top_p="0.1" \
-F repetition_penalty="1.3" \
-F temperature="0.75" \
-F chunk_length="150" \
-F max_new_tokens="768" \
-F seed="42" \
-F reference_audio="@voice-viola.wav" \
--output "speech.wav"
In JSON format:
curl http://localhost:8000/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "fish-speech-1.5",
"input": "Dr. Eleanor Whitaker, a quantum physicist from Edinburgh, surreptitiously analyzed the enigmatic hieroglyphs while humming Für Elise —her quizzical expression mirrored the cryptic symbols perplexing arrangement, yet she remained determined to decipher their archaic secrets.",
"top_p": "0.1",
"repetition_penalty": "1.3",
"temperature": "0.75",
"chunk_length": "150",
"max_new_tokens": "768",
"seed": "42",
"reference_audio": "=base64..."
}' \
--output "speech.wav"
Links
- https://github.com/fishaudio/fish-speech
- https://huggingface.co/fishaudio/fish-speech-1.5
- https://huggingface.co/fishaudio/fish-agent-v0.1-3b