exo
exo copied to clipboard
feat: add uncertainty visualization with token-level logprobs
Motivation
Adds uncertainty visualization to the chat interface, allowing users to see token-level confidence scores and regenerate responses from any point in the generation. This enables users to:
- Understand model confidence at each token
- Explore alternative completions by regenerating from uncertain tokens
- Debug and analyze model behavior
Changes
Uncertainty Visualization
- Add
TokenHeatmapcomponent showing token-level probability coloring - Toggle uncertainty view per message with bar chart icon
- Display tooltip with probability, logprob, and top alternative tokens on hover
Regenerate from Token
- Add "Regenerate from here" button in token tooltip
- Use
continue_final_messagein chat template to continue within same turn (no EOS tokens) - Add
continue_from_prefixflag toChatCompletionTaskParams
Request Cancellation
- Add
AbortControllerto cancel in-flight requests when regenerating mid-generation - Handle
BrokenResourceErrorserver-side when client disconnects gracefully
Additional APIs
- Add Claude Messages API support (
/v1/messages) - Add OpenAI Responses API support (
/v1/responses)
Why It Works
-
Proper continuation: Using
continue_final_message=Trueinstead ofadd_generation_prompt=Truekeeps the assistant turn open, allowing the model to continue naturally from the prefix without end-of-turn markers -
Clean cancellation: AbortController aborts the HTTP request, and server catches
BrokenResourceErrorto avoid crashes - Stable hover during generation: TokenHeatmap tracks hover by index (stable across re-renders) with longer hide delay during generation
Test Plan
Manual Testing
- Send a message and verify logprobs are collected
- Enable uncertainty view and verify token coloring based on probability
- Hover over tokens to see tooltip with alternatives
- Click "Regenerate from here" on a token mid-response
- Verify the response continues naturally from that point
- Verify aborting mid-generation and regenerating works without server crash
Automated Testing
- Added tests for Claude Messages API adapter
- Added tests for OpenAI Responses API adapter
🤖 Generated with Claude Code