feat: add prefill progress bar for long prompts
Motivation
Users processing long prompts have no visibility into when token generation will start. This feature adds a progress bar showing prefill progress, giving users real-time feedback during prompt processing.
Changes
Backend
- Added
PrefillProgressevent type withcommand_id,processed_tokens,total_tokens - Added
PrefillProgressResponsetype (though now using direct callback approach) - Wired
prompt_progress_callbackthrough MLX'sstream_generate() - Progress events sent directly from callback for real-time updates (not batched)
- API generates SSE named events:
event: prefill_progress\ndata: {...} - Added
PrefillProgressDatadataclass andStreamEventunion type in API
Dashboard
- Added
PrefillProgressinterface to store - Updated SSE parsing to handle
event:lines (named events) - Created
PrefillProgressBar.sveltewith animated progress bar - Shows "Processing prompt: X/Y tokens" with percentage
- Progress bar disappears when first token arrives
Why It Works
MLX's stream_generate() accepts a prompt_progress_callback(processed, total) that's called after each prefill chunk. By sending events directly from this callback (rather than yielding from the generator), progress updates are sent in real-time during prefill.
Using SSE named events (event: prefill_progress) maintains full OpenAI/Claude API compatibility - standard clients ignore named events they don't recognize, while the exo dashboard explicitly listens for them.
Test Plan
Manual Testing
- Hardware: MacBook Pro M3 Max
- Set
prefill_step_size=256for more frequent updates - Tested with long prompts (pasted large documents)
- Verified progress bar updates incrementally during prefill
- Confirmed progress bar disappears when generation starts
- Tested with curl - standard
data:events still work normally
Automated Testing
- Type checker passes (0 errors)
- All 192 tests pass
- Dashboard builds successfully
API Compatibility
- Named SSE events are ignored by OpenAI SDK clients
- Regular token data uses standard
data: {...}format -
[DONE]sentinel works as expected
Note: prefill_step_size is temporarily set to 256 for testing. Should be changed back to 2048 before merging for production performance.