woolball-server
woolball-server copied to clipboard
Your own browser-based inference infrastructure by turning idle browsers into compute nodes.
π§Ά Woolball Server
Transform idle browsers into a powerful distributed AI inference network
Your own browser-based inference infrastructure by turning idle browsers into compute nodes
π Quick Start β’ π API Reference β’ π οΈ Development β’ π¬ Discord
β¨ What is Woolball?
Woolball Server is an open-source network server that orchestrates AI inference jobs across a distributed network of browser-based compute nodes. Instead of relying on expensive cloud infrastructure, harness the collective power of idle browsers to run AI models efficiently and cost-effectively.
π Client side: Available in
woolball-client
π Roadmap: Check our next steps
π― Supported AI Tasks
| π§ Provider | π― Task | π€ Models | π Status |
|---|---|---|---|
| Transformers.js | π€ Speech-to-Text | ONNX Models | β Ready |
| Transformers.js | π Text-to-Speech | ONNX Models | β Ready |
| Kokoro.js | π Text-to-Speech | ONNX Models | β Ready |
| Transformers.js | π Translation | ONNX Models | β Ready |
| Transformers.js | π Text Generation | ONNX Models | β Ready |
| WebLLM | π Text Generation | MLC Models | β Ready |
| MediaPipe | π Text Generation | LiteRT Models | β Ready |
π Quick Start
Get up and running in under 2 minutes:
1οΈβ£ Clone & Deploy
git clone --branch deploy --single-branch --depth 1 https://github.com/woolball-xyz/woolball-server.git
cd woolball-server && docker compose up -d
2οΈβ£ Verify Setup
Open http://localhost:9000 to ensure at least one client node is connected.
3οΈβ£ Start Using the API
curl -X POST http://localhost:9002/api/v1/text-generation \
-F 'input=[{"role":"user","content":"Hello! Can you explain what Woolball is?"}]' \
-F "model=https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma3-1b-it-int4.task" \
-F "provider=mediapipe" \
-F "maxTokens=200"
βοΈ One-Click Deploy to DigitalOcean
Deploy Woolball to DigitalOcean App Platform with a single click:
π§ What gets deployed:
- π Woolball Client: Frontend interface accessible via your app URL
- π Core API: RESTful API for AI inference jobs (
/apiroute) - π WebSocket Server: Real-time communication with browser nodes (
/wsroute) - βοΈ Background Service: Job orchestration and node management
- π Redis Database: Managed Redis instance for caching and queues
π After Deployment:
- Your app will be available at
https://your-app-name.ondigitalocean.app - API endpoint:
https://your-app-name.ondigitalocean.app/api/v1 - WebSocket:
wss://your-app-name.ondigitalocean.app/ws
π API Reference
π Text Generation
Generate text with powerful language models
π€ Transformers.js Provider
π€ Available Models
| Model | Quantization | Description |
|---|---|---|
HuggingFaceTB/SmolLM2-135M-Instruct |
fp16 |
Compact model for basic text generation |
HuggingFaceTB/SmolLM2-360M-Instruct |
q4 |
Balanced performance and size |
Mozilla/Qwen2.5-0.5B-Instruct |
q4 |
Efficient model for general tasks |
onnx-community/Qwen2.5-Coder-0.5B-Instruct |
q8 |
Specialized for code generation |
π‘ Example Usage
curl -X POST http://localhost:9002/api/v1/text-generation \
-F 'input=[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is the capital of Brazil?"}]' \
-F "model=HuggingFaceTB/SmolLM2-135M-Instruct" \
-F "dtype=fp16" \
-F "max_new_tokens=250" \
-F "temperature=0.7" \
-F "do_sample=true"
βοΈ Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
model |
string | - | π€ Model ID (e.g., "HuggingFaceTB/SmolLM2-135M-Instruct") |
dtype |
string | - | π§ Quantization level (e.g., "fp16", "q4") |
max_length |
number | 20 | π Maximum length the generated tokens can have (includes input prompt) |
max_new_tokens |
number | null | π Maximum number of tokens to generate, ignoring prompt length |
min_length |
number | 0 | π Minimum length of the sequence to be generated (includes input prompt) |
min_new_tokens |
number | null | π’ Minimum numbers of tokens to generate, ignoring prompt length |
do_sample |
boolean | false | π² Whether to use sampling; use greedy decoding otherwise |
num_beams |
number | 1 | π Number of beams for beam search. 1 means no beam search |
temperature |
number | 1.0 | π‘οΈ Value used to modulate the next token probabilities |
top_k |
number | 50 | π Number of highest probability vocabulary tokens to keep for top-k-filtering |
top_p |
number | 1.0 | π If < 1, only tokens with probabilities adding up to top_p or higher are kept |
repetition_penalty |
number | 1.0 | π Parameter for repetition penalty. 1.0 means no penalty |
no_repeat_ngram_size |
number | 0 | π« If > 0, all ngrams of that size can only occur once |
π€ WebLLM Provider
π€ Available Models
| Model | Description |
|---|---|
DeepSeek-R1-Distill-Qwen-7B-q4f16_1-MLC |
DeepSeek R1 distilled model with reasoning capabilities |
DeepSeek-R1-Distill-Llama-8B-q4f16_1-MLC |
DeepSeek R1 distilled Llama-based model |
SmolLM2-1.7B-Instruct-q4f32_1-MLC |
Compact instruction-following model |
Llama-3.1-8B-Instruct-q4f32_1-MLC |
Meta's Llama 3.1 8B instruction model |
Qwen3-8B-q4f32_1-MLC |
Alibaba's Qwen3 8B model |
π‘ Example Usage
curl -X POST http://localhost:9002/api/v1/text-generation \
-F 'input=[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is the capital of Brazil?"}]' \
-F "model=DeepSeek-R1-Distill-Qwen-7B-q4f16_1-MLC" \
-F "provider=webllm" \
-F "temperature=0.7" \
-F "top_p=0.95"
βοΈ Parameters
| Parameter | Type | Description |
|---|---|---|
model |
string | π€ Model ID from MLC (e.g., "DeepSeek-R1-Distill-Qwen-7B-q4f16_1-MLC") |
provider |
string | π§ Must be set to "webllm" when using WebLLM models |
context_window_size |
number | πͺ Size of the context window for the model |
sliding_window_size |
number | π Size of the sliding window for attention |
attention_sink_size |
number | π― Size of the attention sink |
repetition_penalty |
number | π Penalty for repeating tokens |
frequency_penalty |
number | π Penalty for token frequency |
presence_penalty |
number | ποΈ Penalty for token presence |
top_p |
number | π If < 1, only tokens with probabilities adding up to top_p or higher are kept |
temperature |
number | π‘οΈ Value used to modulate the next token probabilities |
bos_token_id |
number | π Beginning of sequence token ID (optional) |
π± MediaPipe Provider
π€ Available Models
| Model | Device Type | Description |
|---|---|---|
https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma2-2b-it-cpu-int8.task |
CPU | Gemma2 2B model optimized for CPU inference |
https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma2-2b-it-gpu-int8.bin |
GPU | Gemma2 2B model optimized for GPU inference |
https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma3-1b-it-int4.task |
CPU/GPU | Gemma3 1B model with INT4 quantization |
https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma3-4b-it-int4-web.task |
Web | Gemma3 4B model optimized for web deployment |
π‘ Example Usage
curl -X POST http://localhost:9002/api/v1/text-generation \
-F 'input=[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Explain quantum computing in simple terms."}]' \
-F "model=https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma3-1b-it-int4.task" \
-F "provider=mediapipe" \
-F "maxTokens=500" \
-F "temperature=0.7" \
-F "topK=40" \
-F "randomSeed=12345"
βοΈ Parameters
| Parameter | Type | Description |
|---|---|---|
model |
string | π€ Model ID for MediaPipe LiteRT models on DigitalOcean Spaces |
provider |
string | π§ Must be set to "mediapipe" when using MediaPipe models |
maxTokens |
number | π’ Maximum number of tokens to generate |
randomSeed |
number | π² Random seed for reproducible results |
topK |
number | π Number of highest probability vocabulary tokens to keep for top-k-filtering |
temperature |
number | π‘οΈ Value used to modulate the next token probabilities |
π€ Speech Recognition
Convert audio to text with Whisper models
π€ Available Models
| Model | Quantization | Description |
|---|---|---|
onnx-community/whisper-large-v3-turbo_timestamped |
q4 |
π― High accuracy with timestamps |
onnx-community/whisper-small |
q4 |
β‘ Fast processing |
π‘ Example Usage
# π Local file
curl -X POST http://localhost:9002/api/v1/speech-recognition \
-F "input=@/path/to/your/file.mp3" \
-F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
-F "dtype=q4" \
-F "language=en" \
-F "return_timestamps=true" \
-F "stream=false"
# π URL
curl -X POST http://localhost:9002/api/v1/speech-recognition \
-F "input=https://example.com/audio.mp3" \
-F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
-F "dtype=q4" \
-F "language=en" \
-F "return_timestamps=true" \
-F "stream=false"
# π Base64
curl -X POST http://localhost:9002/api/v1/speech-recognition \
-F "input=data:audio/mp3;base64,YOUR_BASE64_ENCODED_AUDIO" \
-F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
-F "dtype=q4" \
-F "language=en" \
-F "return_timestamps=true" \
-F "stream=false"
βοΈ Parameters
| Parameter | Type | Description |
|---|---|---|
model |
string | π€ Model ID from Hugging Face (e.g., "onnx-community/whisper-large-v3-turbo_timestamped") |
dtype |
string | π§ Quantization level (e.g., "q4") |
return_timestamps |
boolean | 'word' | β° Return timestamps ("word" for word-level). Default is false. |
stream |
boolean | π‘ Stream results in real-time. Default is false. |
chunk_length_s |
number | π Length of audio chunks to process in seconds. Default is 0 (no chunking). |
stride_length_s |
number | π Length of overlap between consecutive audio chunks in seconds. If not provided, defaults to chunk_length_s / 6. |
force_full_sequences |
boolean | π― Whether to force outputting full sequences or not. Default is false. |
language |
string | π Source language (auto-detect if null). Use this to potentially improve performance if the source language is known. |
task |
null | 'transcribe' | 'translate' | π― The task to perform. Default is null, meaning it should be auto-detected. |
num_frames |
number | π¬ The number of frames in the input audio. |
π Text-to-Speech
Generate natural speech from text
π€ Transformers.js (MMS Models)
π€ Available Models
| Language | Model | Flag |
|---|---|---|
| English | Xenova/mms-tts-eng |
πΊπΈ |
| Spanish | Xenova/mms-tts-spa |
πͺπΈ |
| French | Xenova/mms-tts-fra |
π«π· |
| German | Xenova/mms-tts-deu |
π©πͺ |
| Portuguese | Xenova/mms-tts-por |
π΅πΉ |
| Russian | Xenova/mms-tts-rus |
π·πΊ |
| Arabic | Xenova/mms-tts-ara |
πΈπ¦ |
| Korean | Xenova/mms-tts-kor |
π°π· |
π‘ Example Usage
# Standard request
curl -X POST http://localhost:9002/api/v1/text-to-speech \
-F "input=Hello, this is a test for text to speech." \
-F "model=Xenova/mms-tts-eng" \
-F "dtype=q8" \
-F "stream=false"
# Streaming request
curl -X POST http://localhost:9002/api/v1/text-to-speech \
-F "input=Hello, this is a test for streaming text to speech." \
-F "model=Xenova/mms-tts-eng" \
-F "dtype=q8" \
-F "stream=true"
βοΈ Parameters
| Parameter | Type | Description | Required For |
|---|---|---|---|
model |
string | π€ Model ID | All providers |
dtype |
string | π§ Quantization level (e.g., "q8") | All providers |
stream |
boolean | π‘ Whether to stream the audio response. Default is false. |
All providers |
π± Kokoro (Premium Voices)
π€ Available Models
| Model | Quantization | Description |
|---|---|---|
onnx-community/Kokoro-82M-ONNX |
q8 |
High-quality English TTS with multiple voices |
onnx-community/Kokoro-82M-v1.0-ONNX |
q8 |
Alternative Kokoro model version |
π‘ Example Usage
# Standard request
curl -X POST http://localhost:9002/api/v1/text-to-speech \
-F "input=Hello, this is a test using Kokoro voices." \
-F "model=onnx-community/Kokoro-82M-ONNX" \
-F "voice=af_nova" \
-F "dtype=q8" \
-F "stream=false"
# Streaming request
curl -X POST http://localhost:9002/api/v1/text-to-speech \
-F "input=Hello, this is a test using Kokoro voices with streaming." \
-F "model=onnx-community/Kokoro-82M-ONNX" \
-F "voice=af_nova" \
-F "dtype=q8" \
-F "stream=true"
βοΈ Parameters
| Parameter | Type | Description | Required For |
|---|---|---|---|
model |
string | π€ Model ID | Required |
dtype |
string | π§ Quantization level (e.g., "q8") | Required |
voice |
string | π Voice ID (see below) | Required |
stream |
boolean | π‘ Whether to stream the audio response. Default is false. |
Optional |
π Available Voice Options
πΊπΈ American Voices
- π© Female:
af_heart,af_alloy,af_aoede,af_bella,af_jessica,af_nova,af_sarah - π¨ Male:
am_adam,am_echo,am_eric,am_liam,am_michael,am_onyx
π¬π§ British Voices
- π© Female:
bf_emma,bf_isabella,bf_alice,bf_lily - π¨ Male:
bm_george,bm_lewis,bm_daniel,bm_fable
π Translation
Translate between 200+ languages
π€ Available Models
| Model | Quantization | Description |
|---|---|---|
Xenova/nllb-200-distilled-600M |
q8 |
π Multilingual translation model supporting 200+ languages |
π‘ Example Usage
curl -X POST http://localhost:9002/api/v1/translation \
-F "input=Hello, how are you today?" \
-F "model=Xenova/nllb-200-distilled-600M" \
-F "dtype=q8" \
-F "srcLang=eng_Latn" \
-F "tgtLang=por_Latn"
π Language Support
Uses FLORES200 format - supports 200+ languages!
βοΈ Parameters
| Parameter | Type | Description |
|---|---|---|
model |
string | π€ Model ID (e.g., "Xenova/nllb-200-distilled-600M") |
dtype |
string | π§ Quantization level (e.g., "q8") |
srcLang |
string | π Source language code in FLORES200 format (e.g., "eng_Latn") |
tgtLang |
string | π Target language code in FLORES200 format (e.g., "por_Latn") |
π οΈ Local Development
π³ Docker Setup (Recommended)
git clone https://github.com/woolball-xyz/woolball-server.git
cd woolball-server && docker compose up --build -d
π Service Endpoints
| π§ Service | πͺ Port | π URL |
|---|---|---|
| π WebSocket | 9003 | localhost:9003 |
| π API Server | 9002 | localhost:9002 |
| π₯ Client Demo | 9000 | localhost:9000 |
π Network Flow

π€ Contributing
We welcome contributions! Here's how you can help:
- π Report bugs via GitHub Issues
- π‘ Suggest features in our Discord
- π§ Submit PRs for improvements
- π Improve documentation
π License
This project is licensed under the AGPL-3.0 License - see the LICENSE file for details.
Made with β€οΈ by the Woolball team