weave icon indicating copy to clipboard operation
weave copied to clipboard

chore(weave): Add Annotation Queues Stats API

Open chance-wnb opened this issue 1 month ago • 3 comments

Add Annotation Queues Stats API

This PR adds a new API endpoint to retrieve statistics for multiple annotation queues in a single batch request. This is needed for the Annotation Queues List page in the frontend to efficiently display progress information for all visible queues.

Motivation

The Annotation Queues List page displays queues in a paginated table with columns showing:

  • Total number of traces/items in each queue
  • Number of completed items (completed or skipped)

Without this API, the frontend would need to make individual requests for each queue or query all queue items separately, which is inefficient for paginated list views.

Changes

Backend (Python)

New API Interface (trace_server_interface.py)

  • AnnotationQueueStatsSchema: Schema for a single queue's stats
    • queue_id: The queue identifier
    • total_items: Count of all items in the queue
    • completed_items: Count of items marked as 'completed' or 'skipped'
  • AnnotationQueuesStatsReq: Request with project_id and list of queue_ids
  • AnnotationQueuesStatsRes: Response with list of stats

Query Builder (annotation_queues_query_builder.py)

  • Added make_queues_stats_query() function that generates an efficient SQL query using CTEs:
    • total_items_per_queue: Counts items from annotation_queue_items table
    • completed_items_per_queue: Counts distinct queue_item_ids with 'completed' or 'skipped' status from annotator_queue_items_progress table
    • Uses LEFT JOINs to return stats even for empty queues (0 items)
    • Uses arrayJoin() to ensure all requested queue_ids are in the result set

ClickHouse Implementation (clickhouse_trace_server_batched.py)

  • Implemented annotation_queues_stats() method
  • Returns empty stats list if no queue_ids provided
  • Uses tuple unpacking from result.result_rows for efficient result parsing

Server Interface Implementations

  • sqlite_trace_server.py: Stub that raises NotImplementedError (annotation queues not supported in SQLite)
  • external_to_internal_trace_server_adapter.py: Converts external project_id to internal format and delegates
  • caching_middleware_trace_server.py: Passthrough to next server
  • remote_http_trace_server.py: HTTP client implementation calling /annotation_queues/stats
  • cross_process_trace_server.py: IPC request handler

Tests

Unit Tests (test_client_annotations.py) Added three comprehensive test cases:

  1. test_annotation_queues_stats - Main test with partial completion:

    • Creates 3 queues with different numbers of items (3, 5, 7)
    • Uses ClickHouse lightweight UPDATE to mark items as 'completed' or 'skipped'
    • Verifies stats correctly count both completed and skipped items
    • Tests mixed completion states (Queue 0: 2/3, Queue 1: 4/5, Queue 2: 4/7)
  2. test_annotation_queues_stats_empty_queues - Edge case for empty queues:

    • Verifies queues with no items return 0 for both total_items and completed_items
  3. test_annotation_queues_stats_no_queue_ids - Edge case for empty request:

    • Verifies empty queue_ids list returns empty stats list

Test Implementation Details

  • Uses direct ClickHouse client access via client.server._next_trace_server.ch_client
  • Uses hardcoded internal project_id (c2hhd24vdGVzdC1wcm9qZWN0) for database queries
  • Uses ClickHouse lightweight UPDATE syntax: UPDATE table SET ... WHERE ...
  • Simulates annotation workflow by updating annotator_queue_items_progress records

API Usage

Request:

AnnotationQueuesStatsReq(
    project_id="entity/project",
    queue_ids=["queue-id-1", "queue-id-2", "queue-id-3"]
)
Response:
AnnotationQueuesStatsRes(
    stats=[
        AnnotationQueueStatsSchema(
            queue_id="queue-id-1",
            total_items=10,
            completed_items=7
        ),
        AnnotationQueueStatsSchema(
            queue_id="queue-id-2",
            total_items=5,
            completed_items=0
        ),
        AnnotationQueueStatsSchema(
            queue_id="queue-id-3",
            total_items=0,
            completed_items=0
        )
    ]
)

Performance Considerations

  • Single database query for all requested queues (efficient for paginated list views)
  • Uses CTEs for readability and potential query optimization
  • Counts DISTINCT queue_item_id in progress table (handles shared pool mode)
  • Returns results for all requested queues, even if empty (consistent API contract)

chance-wnb avatar Nov 22 '25 02:11 chance-wnb

[!WARNING] This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite. Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

chance-wnb avatar Nov 22 '25 02:11 chance-wnb