woolball-server icon indicating copy to clipboard operation
woolball-server copied to clipboard

Your own browser-based inference infrastructure by turning idle browsers into compute nodes.

🧢 Woolball Server

Online Discord License Docker Quality Gate Status

Deploy to DO

Transform idle browsers into a powerful distributed AI inference network

Your own browser-based inference infrastructure by turning idle browsers into compute nodes

πŸš€ Quick Start β€’ πŸ“– API Reference β€’ πŸ› οΈ Development β€’ πŸ’¬ Discord


✨ What is Woolball?

Woolball Server is an open-source network server that orchestrates AI inference jobs across a distributed network of browser-based compute nodes. Instead of relying on expensive cloud infrastructure, harness the collective power of idle browsers to run AI models efficiently and cost-effectively.

πŸ”— Client side: Available in woolball-client
πŸ“‹ Roadmap: Check our next steps


🎯 Supported AI Tasks

πŸ”§ Provider 🎯 Task πŸ€– Models πŸ“Š Status
Transformers.js 🎀 Speech-to-Text ONNX Models βœ… Ready
Transformers.js πŸ”Š Text-to-Speech ONNX Models βœ… Ready
Kokoro.js πŸ”Š Text-to-Speech ONNX Models βœ… Ready
Transformers.js 🌐 Translation ONNX Models βœ… Ready
Transformers.js πŸ“ Text Generation ONNX Models βœ… Ready
WebLLM πŸ“ Text Generation MLC Models βœ… Ready
MediaPipe πŸ“ Text Generation LiteRT Models βœ… Ready

πŸš€ Quick Start

Get up and running in under 2 minutes:

1️⃣ Clone & Deploy

git clone --branch deploy --single-branch --depth 1 https://github.com/woolball-xyz/woolball-server.git
cd woolball-server && docker compose up -d

2️⃣ Verify Setup

Open http://localhost:9000 to ensure at least one client node is connected.

3️⃣ Start Using the API

curl -X POST http://localhost:9002/api/v1/text-generation \
  -F 'input=[{"role":"user","content":"Hello! Can you explain what Woolball is?"}]' \
  -F "model=https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma3-1b-it-int4.task" \
  -F "provider=mediapipe" \
  -F "maxTokens=200"

☁️ One-Click Deploy to DigitalOcean

Deploy Woolball to DigitalOcean App Platform with a single click:

Deploy to DO

πŸ”§ What gets deployed:

  • 🌐 Woolball Client: Frontend interface accessible via your app URL
  • πŸ”Œ Core API: RESTful API for AI inference jobs (/api route)
  • πŸ”— WebSocket Server: Real-time communication with browser nodes (/ws route)
  • βš™οΈ Background Service: Job orchestration and node management
  • πŸ“Š Redis Database: Managed Redis instance for caching and queues

πŸš€ After Deployment:

  1. Your app will be available at https://your-app-name.ondigitalocean.app
  2. API endpoint: https://your-app-name.ondigitalocean.app/api/v1
  3. WebSocket: wss://your-app-name.ondigitalocean.app/ws

πŸ“– API Reference

πŸ“– Text Generation

Generate text with powerful language models

πŸ€— Transformers.js Provider

πŸ€– Available Models

Model Quantization Description
HuggingFaceTB/SmolLM2-135M-Instruct fp16 Compact model for basic text generation
HuggingFaceTB/SmolLM2-360M-Instruct q4 Balanced performance and size
Mozilla/Qwen2.5-0.5B-Instruct q4 Efficient model for general tasks
onnx-community/Qwen2.5-Coder-0.5B-Instruct q8 Specialized for code generation

πŸ’‘ Example Usage

curl -X POST http://localhost:9002/api/v1/text-generation \
  -F 'input=[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is the capital of Brazil?"}]' \
  -F "model=HuggingFaceTB/SmolLM2-135M-Instruct" \
  -F "dtype=fp16" \
  -F "max_new_tokens=250" \
  -F "temperature=0.7" \
  -F "do_sample=true"

βš™οΈ Parameters

Parameter Type Default Description
model string - πŸ€– Model ID (e.g., "HuggingFaceTB/SmolLM2-135M-Instruct")
dtype string - πŸ”§ Quantization level (e.g., "fp16", "q4")
max_length number 20 πŸ“ Maximum length the generated tokens can have (includes input prompt)
max_new_tokens number null πŸ†• Maximum number of tokens to generate, ignoring prompt length
min_length number 0 πŸ“ Minimum length of the sequence to be generated (includes input prompt)
min_new_tokens number null πŸ”’ Minimum numbers of tokens to generate, ignoring prompt length
do_sample boolean false 🎲 Whether to use sampling; use greedy decoding otherwise
num_beams number 1 πŸ” Number of beams for beam search. 1 means no beam search
temperature number 1.0 🌑️ Value used to modulate the next token probabilities
top_k number 50 πŸ” Number of highest probability vocabulary tokens to keep for top-k-filtering
top_p number 1.0 πŸ“Š If < 1, only tokens with probabilities adding up to top_p or higher are kept
repetition_penalty number 1.0 πŸ”„ Parameter for repetition penalty. 1.0 means no penalty
no_repeat_ngram_size number 0 🚫 If > 0, all ngrams of that size can only occur once

πŸ€– WebLLM Provider

πŸ€– Available Models

Model Description
DeepSeek-R1-Distill-Qwen-7B-q4f16_1-MLC DeepSeek R1 distilled model with reasoning capabilities
DeepSeek-R1-Distill-Llama-8B-q4f16_1-MLC DeepSeek R1 distilled Llama-based model
SmolLM2-1.7B-Instruct-q4f32_1-MLC Compact instruction-following model
Llama-3.1-8B-Instruct-q4f32_1-MLC Meta's Llama 3.1 8B instruction model
Qwen3-8B-q4f32_1-MLC Alibaba's Qwen3 8B model

πŸ’‘ Example Usage

curl -X POST http://localhost:9002/api/v1/text-generation \
  -F 'input=[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is the capital of Brazil?"}]' \
  -F "model=DeepSeek-R1-Distill-Qwen-7B-q4f16_1-MLC" \
  -F "provider=webllm" \
  -F "temperature=0.7" \
  -F "top_p=0.95"

βš™οΈ Parameters

Parameter Type Description
model string πŸ€– Model ID from MLC (e.g., "DeepSeek-R1-Distill-Qwen-7B-q4f16_1-MLC")
provider string πŸ”§ Must be set to "webllm" when using WebLLM models
context_window_size number πŸͺŸ Size of the context window for the model
sliding_window_size number πŸ”„ Size of the sliding window for attention
attention_sink_size number 🎯 Size of the attention sink
repetition_penalty number πŸ”„ Penalty for repeating tokens
frequency_penalty number πŸ“Š Penalty for token frequency
presence_penalty number πŸ‘οΈ Penalty for token presence
top_p number πŸ“ˆ If < 1, only tokens with probabilities adding up to top_p or higher are kept
temperature number 🌑️ Value used to modulate the next token probabilities
bos_token_id number 🏁 Beginning of sequence token ID (optional)

πŸ“± MediaPipe Provider

πŸ€– Available Models

Model Device Type Description
https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma2-2b-it-cpu-int8.task CPU Gemma2 2B model optimized for CPU inference
https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma2-2b-it-gpu-int8.bin GPU Gemma2 2B model optimized for GPU inference
https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma3-1b-it-int4.task CPU/GPU Gemma3 1B model with INT4 quantization
https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma3-4b-it-int4-web.task Web Gemma3 4B model optimized for web deployment

πŸ’‘ Example Usage

curl -X POST http://localhost:9002/api/v1/text-generation \
  -F 'input=[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Explain quantum computing in simple terms."}]' \
  -F "model=https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma3-1b-it-int4.task" \
  -F "provider=mediapipe" \
  -F "maxTokens=500" \
  -F "temperature=0.7" \
  -F "topK=40" \
  -F "randomSeed=12345"

βš™οΈ Parameters

Parameter Type Description
model string πŸ€– Model ID for MediaPipe LiteRT models on DigitalOcean Spaces
provider string πŸ”§ Must be set to "mediapipe" when using MediaPipe models
maxTokens number πŸ”’ Maximum number of tokens to generate
randomSeed number 🎲 Random seed for reproducible results
topK number πŸ” Number of highest probability vocabulary tokens to keep for top-k-filtering
temperature number 🌑️ Value used to modulate the next token probabilities

🎀 Speech Recognition

Convert audio to text with Whisper models

πŸ€– Available Models

Model Quantization Description
onnx-community/whisper-large-v3-turbo_timestamped q4 🎯 High accuracy with timestamps
onnx-community/whisper-small q4 ⚑ Fast processing

πŸ’‘ Example Usage

# πŸ“ Local file
curl -X POST http://localhost:9002/api/v1/speech-recognition \
  -F "input=@/path/to/your/file.mp3" \
  -F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
  -F "dtype=q4" \
  -F "language=en" \
  -F "return_timestamps=true" \
  -F "stream=false"

# πŸ”— URL
curl -X POST http://localhost:9002/api/v1/speech-recognition \
  -F "input=https://example.com/audio.mp3" \
  -F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
  -F "dtype=q4" \
  -F "language=en" \
  -F "return_timestamps=true" \
  -F "stream=false"

# πŸ“Š Base64
curl -X POST http://localhost:9002/api/v1/speech-recognition \
  -F "input=data:audio/mp3;base64,YOUR_BASE64_ENCODED_AUDIO" \
  -F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
  -F "dtype=q4" \
  -F "language=en" \
  -F "return_timestamps=true" \
  -F "stream=false"

βš™οΈ Parameters

Parameter Type Description
model string πŸ€– Model ID from Hugging Face (e.g., "onnx-community/whisper-large-v3-turbo_timestamped")
dtype string πŸ”§ Quantization level (e.g., "q4")
return_timestamps boolean | 'word' ⏰ Return timestamps ("word" for word-level). Default is false.
stream boolean πŸ“‘ Stream results in real-time. Default is false.
chunk_length_s number πŸ“ Length of audio chunks to process in seconds. Default is 0 (no chunking).
stride_length_s number πŸ”„ Length of overlap between consecutive audio chunks in seconds. If not provided, defaults to chunk_length_s / 6.
force_full_sequences boolean 🎯 Whether to force outputting full sequences or not. Default is false.
language string 🌍 Source language (auto-detect if null). Use this to potentially improve performance if the source language is known.
task null | 'transcribe' | 'translate' 🎯 The task to perform. Default is null, meaning it should be auto-detected.
num_frames number 🎬 The number of frames in the input audio.

πŸ”Š Text-to-Speech

Generate natural speech from text

πŸ€— Transformers.js (MMS Models)

πŸ€– Available Models

Language Model Flag
English Xenova/mms-tts-eng πŸ‡ΊπŸ‡Έ
Spanish Xenova/mms-tts-spa πŸ‡ͺπŸ‡Έ
French Xenova/mms-tts-fra πŸ‡«πŸ‡·
German Xenova/mms-tts-deu πŸ‡©πŸ‡ͺ
Portuguese Xenova/mms-tts-por πŸ‡΅πŸ‡Ή
Russian Xenova/mms-tts-rus πŸ‡·πŸ‡Ί
Arabic Xenova/mms-tts-ara πŸ‡ΈπŸ‡¦
Korean Xenova/mms-tts-kor πŸ‡°πŸ‡·

πŸ’‘ Example Usage

# Standard request
curl -X POST http://localhost:9002/api/v1/text-to-speech \
  -F "input=Hello, this is a test for text to speech." \
  -F "model=Xenova/mms-tts-eng" \
  -F "dtype=q8" \
  -F "stream=false"

# Streaming request
curl -X POST http://localhost:9002/api/v1/text-to-speech \
  -F "input=Hello, this is a test for streaming text to speech." \
  -F "model=Xenova/mms-tts-eng" \
  -F "dtype=q8" \
  -F "stream=true"

βš™οΈ Parameters

Parameter Type Description Required For
model string πŸ€– Model ID All providers
dtype string πŸ”§ Quantization level (e.g., "q8") All providers
stream boolean πŸ“‘ Whether to stream the audio response. Default is false. All providers

🐱 Kokoro (Premium Voices)

πŸ€– Available Models

Model Quantization Description
onnx-community/Kokoro-82M-ONNX q8 High-quality English TTS with multiple voices
onnx-community/Kokoro-82M-v1.0-ONNX q8 Alternative Kokoro model version

πŸ’‘ Example Usage

# Standard request
curl -X POST http://localhost:9002/api/v1/text-to-speech \
  -F "input=Hello, this is a test using Kokoro voices." \
  -F "model=onnx-community/Kokoro-82M-ONNX" \
  -F "voice=af_nova" \
  -F "dtype=q8" \
  -F "stream=false"

# Streaming request
curl -X POST http://localhost:9002/api/v1/text-to-speech \
  -F "input=Hello, this is a test using Kokoro voices with streaming." \
  -F "model=onnx-community/Kokoro-82M-ONNX" \
  -F "voice=af_nova" \
  -F "dtype=q8" \
  -F "stream=true"

βš™οΈ Parameters

Parameter Type Description Required For
model string πŸ€– Model ID Required
dtype string πŸ”§ Quantization level (e.g., "q8") Required
voice string 🎭 Voice ID (see below) Required
stream boolean πŸ“‘ Whether to stream the audio response. Default is false. Optional

🎭 Available Voice Options

πŸ‡ΊπŸ‡Έ American Voices

  • πŸ‘© Female: af_heart, af_alloy, af_aoede, af_bella, af_jessica, af_nova, af_sarah
  • πŸ‘¨ Male: am_adam, am_echo, am_eric, am_liam, am_michael, am_onyx

πŸ‡¬πŸ‡§ British Voices

  • πŸ‘© Female: bf_emma, bf_isabella, bf_alice, bf_lily
  • πŸ‘¨ Male: bm_george, bm_lewis, bm_daniel, bm_fable

🌐 Translation

Translate between 200+ languages

πŸ€– Available Models

Model Quantization Description
Xenova/nllb-200-distilled-600M q8 🌍 Multilingual translation model supporting 200+ languages

πŸ’‘ Example Usage

curl -X POST http://localhost:9002/api/v1/translation \
  -F "input=Hello, how are you today?" \
  -F "model=Xenova/nllb-200-distilled-600M" \
  -F "dtype=q8" \
  -F "srcLang=eng_Latn" \
  -F "tgtLang=por_Latn"

🌍 Language Support

Uses FLORES200 format - supports 200+ languages!

βš™οΈ Parameters

Parameter Type Description
model string πŸ€– Model ID (e.g., "Xenova/nllb-200-distilled-600M")
dtype string πŸ”§ Quantization level (e.g., "q8")
srcLang string 🌍 Source language code in FLORES200 format (e.g., "eng_Latn")
tgtLang string 🌍 Target language code in FLORES200 format (e.g., "por_Latn")

πŸ› οΈ Local Development

🐳 Docker Setup (Recommended)

git clone https://github.com/woolball-xyz/woolball-server.git
cd woolball-server && docker compose up --build -d

🌐 Service Endpoints

πŸ”§ Service πŸšͺ Port πŸ”— URL
πŸ”Œ WebSocket 9003 localhost:9003
🌐 API Server 9002 localhost:9002
πŸ‘₯ Client Demo 9000 localhost:9000

πŸ”„ Network Flow

Network Architecture


🀝 Contributing

We welcome contributions! Here's how you can help:

  • πŸ› Report bugs via GitHub Issues
  • πŸ’‘ Suggest features in our Discord
  • πŸ”§ Submit PRs for improvements
  • πŸ“– Improve documentation

πŸ“„ License

This project is licensed under the AGPL-3.0 License - see the LICENSE file for details.


Made with ❀️ by the Woolball team

🌟 Star us on GitHub β€’ πŸ’¬ Join Discord