exo icon indicating copy to clipboard operation
exo copied to clipboard

feat: add uncertainty visualization with token-level logprobs

Open AlexCheema opened this issue 1 month ago • 0 comments

Motivation

Adds uncertainty visualization to the chat interface, allowing users to see token-level confidence scores and regenerate responses from any point in the generation. This enables users to:

  • Understand model confidence at each token
  • Explore alternative completions by regenerating from uncertain tokens
  • Debug and analyze model behavior

Changes

Uncertainty Visualization

  • Add TokenHeatmap component showing token-level probability coloring
  • Toggle uncertainty view per message with bar chart icon
  • Display tooltip with probability, logprob, and top alternative tokens on hover

Regenerate from Token

  • Add "Regenerate from here" button in token tooltip
  • Use continue_final_message in chat template to continue within same turn (no EOS tokens)
  • Add continue_from_prefix flag to ChatCompletionTaskParams

Request Cancellation

  • Add AbortController to cancel in-flight requests when regenerating mid-generation
  • Handle BrokenResourceError server-side when client disconnects gracefully

Additional APIs

  • Add Claude Messages API support (/v1/messages)
  • Add OpenAI Responses API support (/v1/responses)

Why It Works

  • Proper continuation: Using continue_final_message=True instead of add_generation_prompt=True keeps the assistant turn open, allowing the model to continue naturally from the prefix without end-of-turn markers
  • Clean cancellation: AbortController aborts the HTTP request, and server catches BrokenResourceError to avoid crashes
  • Stable hover during generation: TokenHeatmap tracks hover by index (stable across re-renders) with longer hide delay during generation

Test Plan

Manual Testing

  • Send a message and verify logprobs are collected
  • Enable uncertainty view and verify token coloring based on probability
  • Hover over tokens to see tooltip with alternatives
  • Click "Regenerate from here" on a token mid-response
  • Verify the response continues naturally from that point
  • Verify aborting mid-generation and regenerating works without server crash

Automated Testing

  • Added tests for Claude Messages API adapter
  • Added tests for OpenAI Responses API adapter

🤖 Generated with Claude Code

AlexCheema avatar Jan 17 '26 17:01 AlexCheema