exo icon indicating copy to clipboard operation
exo copied to clipboard

feat: add prefill progress bar for long prompts

Open AlexCheema opened this issue 1 month ago • 0 comments

Motivation

Users processing long prompts have no visibility into when token generation will start. This feature adds a progress bar showing prefill progress, giving users real-time feedback during prompt processing.

Changes

Backend

  • Added PrefillProgress event type with command_id, processed_tokens, total_tokens
  • Added PrefillProgressResponse type (though now using direct callback approach)
  • Wired prompt_progress_callback through MLX's stream_generate()
  • Progress events sent directly from callback for real-time updates (not batched)
  • API generates SSE named events: event: prefill_progress\ndata: {...}
  • Added PrefillProgressData dataclass and StreamEvent union type in API

Dashboard

  • Added PrefillProgress interface to store
  • Updated SSE parsing to handle event: lines (named events)
  • Created PrefillProgressBar.svelte with animated progress bar
  • Shows "Processing prompt: X/Y tokens" with percentage
  • Progress bar disappears when first token arrives

Why It Works

MLX's stream_generate() accepts a prompt_progress_callback(processed, total) that's called after each prefill chunk. By sending events directly from this callback (rather than yielding from the generator), progress updates are sent in real-time during prefill.

Using SSE named events (event: prefill_progress) maintains full OpenAI/Claude API compatibility - standard clients ignore named events they don't recognize, while the exo dashboard explicitly listens for them.

Test Plan

Manual Testing

  • Hardware: MacBook Pro M3 Max
  • Set prefill_step_size=256 for more frequent updates
  • Tested with long prompts (pasted large documents)
  • Verified progress bar updates incrementally during prefill
  • Confirmed progress bar disappears when generation starts
  • Tested with curl - standard data: events still work normally

Automated Testing

  • Type checker passes (0 errors)
  • All 192 tests pass
  • Dashboard builds successfully

API Compatibility

  • Named SSE events are ignored by OpenAI SDK clients
  • Regular token data uses standard data: {...} format
  • [DONE] sentinel works as expected

Note: prefill_step_size is temporarily set to 256 for testing. Should be changed back to 2048 before merging for production performance.

AlexCheema avatar Jan 17 '26 17:01 AlexCheema