feat(weave): Add Database Migration for Queue-Based Call Annotation System
Summary
This PR introduces the database schema for a queue-based call annotation system in Weave Trace. The system enables teams to create annotation queues, manually curate calls for review, and collaborate on systematic call annotation with multi-annotator support.
https://wandb.atlassian.net/browse/WB-29392
Database Schema Overview
The migration creates three core tables that work together to support annotation workflows:
1. annotation_queues - Queue Definitions
Stores queue metadata and configuration:
- Queue identity: UUID-based unique identifier
- Scorers: Array of weave refs (e.g.,
weave:///entity/project/scorer/name:version) for multi-dimensional annotation - Claim timeout: Configurable duration before claimed items expire (default: 30 minutes)
- Soft delete support:
deleted_atfor non-destructive queue archival
Key Features:
- Native
Array(String)for scorer refs (efficient access without JSON parsing) - Lightweight UPDATE support for modifying queue properties
2. annotation_queue_items - Queue Membership
Tracks which calls belong to each queue using a proxy pattern:
- Call membership: Links queues to calls without modifying the immutable
calls_mergedtable - Cached call fields: Snapshots of
started_at,ended_at,op_name,trace_idat add-time for fast sorting/filtering without joins - Display fields: Per-batch configuration of which JSON paths to show annotators (e.g.,
['input.prompt', 'output.text']) - Multi-queue support: Same call can appear in multiple queues with independent state
Why the Proxy Pattern?
- ✅ Keeps annotation workflow metadata separate from core call data
- ✅ Enables multi-queue support (same call in different queues)
- ✅ Fast queries without expensive joins to the massive
calls_mergedtable - ✅ Snapshot semantics: queue items capture call state at add-time
3. annotator_queue_items_progress - Workflow State Tracking
Manages per-annotator progress through annotation workflows:
- Workflow states:
pending→claimed→completedorskipped - Claim management: Prevents duplicate work via temporary claims with expiration
- Denormalized fields:
queue_idandcall_idfor efficient querying without joins - Multi-annotator ready: Schema supports both shared pool (MVP) and future N-annotations-per-call workflows
Claim Expiration Mechanism:
- When annotator claims an item:
claim_expires_at = now() + claim_timeout_seconds - Active claims prevent other annotators from working on the same item
- Expired claims can be reclaimed by any annotator (no background jobs needed)
Index Strategy (ORDER BY Clause Rationale)
ClickHouse's ORDER BY clause determines both physical sort order and the primary key index. Our indexing strategy prioritizes direct lookups by UUID and efficient JOINs over uniqueness enforcement (which is handled at the application layer).
annotation_queues: ORDER BY (project_id, id)
Optimizes:
- ✅ Direct queue lookups:
SELECT * FROM annotation_queues WHERE id = ? - ✅ List queues in a project:
SELECT * FROM annotation_queues WHERE project_id = ?
Rationale: Queues are accessed primarily by their UUID identifier. Project-level scoping ensures tenant isolation.
annotation_queue_items: ORDER BY (project_id, queue_id, id)
Optimizes:
- ✅ Fast JOIN lookups:
JOIN annotation_queue_items qi ON qi.id = p.queue_item_id(primary use case) - ✅ Direct item access:
SELECT * FROM annotation_queue_items WHERE id = ? - ✅ List all items in queue:
SELECT * FROM annotation_queue_items WHERE queue_id = ?
Why NOT ORDER BY (project_id, queue_id, call_id)?
I considered including call_id for duplicate checking optimization:
-- Duplicate check query (runs during batch add)
SELECT call_id FROM annotation_queue_items
WHERE project_id = ? AND queue_id = ? AND call_id IN (...)
However, we chose to prioritize id because:
call_iddoesn't provide uniqueness: The same call can appear in multiple queues (multi-queue support is a core feature)- JOIN frequency >> duplicate check frequency:
- JOINs happen on every annotation action (read display fields, check state)
- Duplicate checks only happen during batch add operations (less frequent)
- Small dataset per queue: With typical queue sizes (100s-1000s of items), scanning queue granules to filter by
call_idis fast enough (microseconds) - Direct item API access: Many endpoints use
GET /annotation_queue_items/{item_id}, which requires fast UUID lookup
Trade-off Accepted: Duplicate checking requires scanning all granules in a queue and filtering by call_id, but this is negligible for typical queue sizes.
annotator_queue_items_progress: ORDER BY (project_id, queue_id, id)
Optimizes:
- ✅ Direct progress record access:
PATCH /annotation_queue_progress/{progress_id} - ✅ List all progress in queue:
SELECT * FROM annotator_queue_items_progress WHERE queue_id = ?
Why NOT include annotation_state in ORDER BY?
I considered ORDER BY (project_id, queue_id, annotation_state, id) to optimize:
-- Query for available work
SELECT * FROM annotator_queue_items_progress
WHERE queue_id = ? AND annotation_state = 'pending'
However, I chose NOT to include annotation_state because:
- Causes data reorganization on UPDATE: Including
annotation_statein the sort key means every state transition (pending→claimed→completed) physically moves the row on disk - High update frequency: Items transition through states constantly during annotation workflows
- Defeats lightweight UPDATE optimization: The migration specifically enables ClickHouse's lightweight UPDATE feature (
enable_block_number_column), which works best when updated columns are NOT in the sort key - Minimal query performance impact: With typical queue sizes (100s-1000s), scanning all queue granules and filtering by state is fast enough
ClickHouse Granule Scanning:
- Without
annotation_statein ORDER BY: Scan ~2 granules (8192 rows each) for a 10K-item queue - With
annotation_statein ORDER BY: Scan ~0.25 granules (only 'pending' items) - At this scale, the 4x difference is negligible (microseconds) compared to data reorganization overhead (milliseconds + merge overhead)
Trade-off Accepted: State filtering requires scanning all queue granules, but this optimizes for the more critical update path.
Technical Details
Lightweight UPDATE Support
All three tables enable ClickHouse's lightweight UPDATE feature:
SETTINGS
enable_block_number_column = 1,
enable_block_offset_column = 1;
This allows efficient in-place updates for:
display_fieldsinannotation_queue_items(if needed)annotation_state,claimed_at,completed_atinannotator_queue_items_progress
Note: This adds minimal overhead to SELECT queries (patch application) but dramatically improves UPDATE performance.
Soft Delete Pattern
All tables include deleted_at Nullable(DateTime64(3)) for non-destructive deletion:
- NULL = active record
- Non-NULL = soft deleted (timestamp of deletion)
Related Documentation
Full design document: Queue-Based Call Annotation System.md
This PR is database-only and introduces no breaking changes. The tables are new and do not modify existing schema.
- #5781

- #5777

- #5772
👈 (View in Graphite) master
This stack of pull requests is managed by Graphite. Learn more about stacking.
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:loudspeaker: Thoughts on this report? Let us know!
Preview this PR with FeatureBee: https://beta.wandb.ai/?betaVersion=e082febddcc427d66c059c433a2484dbdedeb543
❌ Documentation Reference Check Failed
No documentation reference found in the PR description. Please add either:
- A link to a docs PR (format: wandb/docs#XXX or https://github.com/wandb/docs/pull/XXX)
- A Jira ticket reference (format: DOCS-XXX or https://wandb.atlassian.net/browse/DOCS-XXX)
This check is required for all PRs except those that start with "chore(weave)" or explicitly state "docs are not required". Please update your PR description and this check will run again automatically.