feat: add prefill progress bar for long prompts

Open AlexCheema opened this issue 1 month ago • 0 comments

Motivation

Users processing long prompts have no visibility into when token generation will start. This feature adds a progress bar showing prefill progress, giving users real-time feedback during prompt processing.

Changes

Backend

Added PrefillProgress event type with command_id, processed_tokens, total_tokens
Added PrefillProgressResponse type (though now using direct callback approach)
Wired prompt_progress_callback through MLX's stream_generate()
Progress events sent directly from callback for real-time updates (not batched)
API generates SSE named events: event: prefill_progress\ndata: {...}
Added PrefillProgressData dataclass and StreamEvent union type in API

Dashboard

Added PrefillProgress interface to store
Updated SSE parsing to handle event: lines (named events)
Created PrefillProgressBar.svelte with animated progress bar
Shows "Processing prompt: X/Y tokens" with percentage
Progress bar disappears when first token arrives

Why It Works

MLX's stream_generate() accepts a prompt_progress_callback(processed, total) that's called after each prefill chunk. By sending events directly from this callback (rather than yielding from the generator), progress updates are sent in real-time during prefill.

Using SSE named events (event: prefill_progress) maintains full OpenAI/Claude API compatibility - standard clients ignore named events they don't recognize, while the exo dashboard explicitly listens for them.

Test Plan

Manual Testing

Hardware: MacBook Pro M3 Max
Set prefill_step_size=256 for more frequent updates
Tested with long prompts (pasted large documents)
Verified progress bar updates incrementally during prefill
Confirmed progress bar disappears when generation starts
Tested with curl - standard data: events still work normally

Automated Testing

Type checker passes (0 errors)
All 192 tests pass
Dashboard builds successfully

API Compatibility

Named SSE events are ignored by OpenAI SDK clients
Regular token data uses standard data: {...} format
[DONE] sentinel works as expected

Note: prefill_step_size is temporarily set to 256 for testing. Should be changed back to 2048 before merging for production performance.

Jan 17 '26 17:01 AlexCheema