Daft feat: Allow dashboard to show query canceled/failed/dead information when query exited abnormally

Changes Made

This is a sub-pr for the issue #5518

[x] Add QueryResult and PyQueryResult to show detailed state & error of a query
[ ] Modify Subscriber interface to accept query end result
[ ] Daft-Dashboard add server support for /query/<id>/end (with frontend modification)

Related Issues

#5518

Checklist

[ ] Documented in API Docs (if applicable)
[ ] Documented in User Guide (if applicable)
[ ] If adding a new documentation page, doc is added to docs/mkdocs.yml navigation
[ ] Documentation builds and is formatted properly

Nov 14 '25 03:11 VOID001

Greptile Overview

Greptile Summary

This PR implements comprehensive query state tracking for canceled/failed/dead queries in the Daft Dashboard. The implementation adds QueryResult with end state and error message throughout the stack - from Python runner through Rust context to dashboard backend and frontend UI. The changes properly notify subscribers when queries end abnormally (keyboard interrupt, UDF exceptions, or general exceptions) and display appropriate status badges with error details in the dashboard.

Key Changes:

Added QueryEndState enum (Finished/Canceled/Failed/Dead) and QueryResult struct in Rust
Updated Subscriber trait to accept QueryResult in on_query_end callback
Modified native_runner.py to catch exceptions and notify with appropriate end states
Enhanced dashboard backend to handle Canceled/Failed states with error messages
Added UI components for Canceled/Failed states with tooltips showing error details
Comprehensive tests validate all query end states

Issues Found:

todo!() panic in engine.rs:325 for Dead state will crash dashboard if Dead state is received
Missing duration_sec field in TypeScript FailedStatus and CanceledStatus types causes type mismatch with backend data

Confidence Score: 3/5

Safe to merge after fixing critical Dead state panic and TypeScript type mismatches
The implementation is mostly solid with proper state propagation through the stack and good test coverage. However, two logical issues prevent this from being merge-ready: (1) the todo!() in engine.rs will panic if Dead state is received, potentially crashing the dashboard, and (2) TypeScript types are missing duration_sec fields that the backend provides, which could cause runtime type errors in the frontend
Pay close attention to src/daft-dashboard/src/engine.rs (Dead state panic) and src/daft-dashboard/frontend/src/hooks/use-queries.ts (missing duration fields)

Important Files Changed

File Analysis

Filename	Score	Overview
daft/runners/native_runner.py	4/5	Added comprehensive exception handling with try-except-else to notify subscribers of query end state (Finished/Canceled/Failed)
src/daft-context/src/python.rs	5/5	Added `PyQueryResult` type and updated `notify_query_end` to accept query result with end state and error message
src/daft-context/src/subscribers/mod.rs	5/5	Added `QueryResult` struct with end state and error message; updated `Subscriber` trait `on_query_end` signature
src/daft-dashboard/src/engine.rs	3/5	Added support for Canceled/Failed states in `query_end` endpoint; Dead state implementation is marked as `todo!()`
src/daft-dashboard/src/state.rs	5/5	Added Canceled and Failed variants to QueryState and QueryStatus enums with error messages and duration tracking
tests/test_subscribers.py	5/5	Added comprehensive tests for Finished, Failed, and Canceled query states with MockSubscriber validation

Sequence Diagram

sequenceDiagram
    participant User
    participant NativeRunner
    participant DaftContext
    participant Subscriber
    participant Dashboard
    
    User->>NativeRunner: run_iter(builder)
    NativeRunner->>DaftContext: notify_query_start(query_id, metadata)
    DaftContext->>Subscriber: on_query_start(query_id, metadata)
    Subscriber->>Dashboard: POST /query/{id}/start
    
    NativeRunner->>DaftContext: notify_optimization_start(query_id)
    DaftContext->>Subscriber: on_optimization_start(query_id)
    Subscriber->>Dashboard: POST /query/{id}/plan_start
    
    NativeRunner->>NativeRunner: builder.optimize()
    
    NativeRunner->>DaftContext: notify_optimization_end(query_id, plan)
    DaftContext->>Subscriber: on_optimization_end(query_id, plan)
    Subscriber->>Dashboard: POST /query/{id}/plan_end
    
    NativeRunner->>NativeRunner: execute plan
    
    alt Success Path
        loop For each result
            NativeRunner->>DaftContext: notify_result_out(query_id, result)
            DaftContext->>Subscriber: on_result_out(query_id, result)
            NativeRunner-->>User: yield result
        end
        NativeRunner->>DaftContext: notify_query_end(query_id, Finished)
        DaftContext->>Subscriber: on_query_end(query_id, Finished)
        Subscriber->>Dashboard: POST /query/{id}/end (state=Finished)
    else KeyboardInterrupt
        NativeRunner->>DaftContext: notify_query_end(query_id, Canceled)
        DaftContext->>Subscriber: on_query_end(query_id, Canceled)
        Subscriber->>Dashboard: POST /query/{id}/end (state=Canceled)
        NativeRunner-->>User: raise KeyboardInterrupt
    else UDFException
        NativeRunner->>DaftContext: notify_query_end(query_id, Failed)
        DaftContext->>Subscriber: on_query_end(query_id, Failed)
        Subscriber->>Dashboard: POST /query/{id}/end (state=Failed)
        NativeRunner-->>User: raise UDFException
    else Other Exception
        NativeRunner->>DaftContext: notify_query_end(query_id, Failed)
        DaftContext->>Subscriber: on_query_end(query_id, Failed)
        Subscriber->>Dashboard: POST /query/{id}/end (state=Failed)
        NativeRunner-->>User: raise Exception
    end
    
    Dashboard->>Dashboard: Update QueryState (Finished/Canceled/Failed)
    Dashboard->>User: Display final status in UI

Nov 14 '25 03:11 greptile-apps[bot]

@greptile

Nov 16 '25 15:11 VOID001

Hi @srilman, this PR is ready for review, I also mentioned the impl in slack channel. You may take a look when you have time

I will finish the "liveness check" part in next PR

Nov 17 '25 09:11 VOID001

Also, I wonder how should we add unit-test or tests for this kind of PR, it is an end-to-end with dashboard-server case. I don't see currently have any good framework inside repo to perform it

Nov 17 '25 09:11 VOID001

Codecov Report

:x: Patch coverage is 56.31068% with 45 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 70.37%. Comparing base (01b06e3) to head (6a83a8a). :warning: Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
src/daft-dashboard/src/engine.rs	0.00%	21 Missing :warning:
src/daft-dashboard/src/state.rs	0.00%	11 Missing :warning:
src/daft-context/src/python.rs	80.00%	7 Missing :warning:
src/daft-context/src/subscribers/dashboard.rs	0.00%	3 Missing :warning:
src/common/metrics/src/lib.rs	50.00%	1 Missing :warning:
src/daft-context/src/subscribers/debug.rs	0.00%	1 Missing :warning:
src/daft-context/src/subscribers/python.rs	83.33%	1 Missing :warning:

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5576      +/-   ##
==========================================
- Coverage   70.38%   70.37%   -0.02%     
==========================================
  Files        1021     1021              
  Lines      131276   131360      +84     
==========================================
+ Hits        92403    92448      +45     
- Misses      38873    38912      +39

Files with missing lines	Coverage Δ
daft/context.py	`87.64% <100.00%> (ø)`
daft/runners/native_runner.py	`80.00% <100.00%> (+4.63%)`	:arrow_up:
daft/subscribers/abc.py	`72.22% <100.00%> (ø)`
src/daft-context/src/lib.rs	`95.31% <100.00%> (+0.03%)`	:arrow_up:
src/daft-context/src/subscribers/mod.rs	`44.44% <ø> (ø)`
src/common/metrics/src/lib.rs	`39.39% <50.00%> (+0.68%)`	:arrow_up:
src/daft-context/src/subscribers/debug.rs	`0.00% <0.00%> (ø)`
src/daft-context/src/subscribers/python.rs	`92.10% <83.33%> (-0.69%)`	:arrow_down:
src/daft-context/src/subscribers/dashboard.rs	`2.19% <0.00%> (-0.03%)`	:arrow_down:
src/daft-context/src/python.rs	`82.19% <80.00%> (-1.00%)`	:arrow_down:
... and 2 more

... and 8 files with indirect coverage changes

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
:package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Nov 17 '25 10:11 codecov[bot]

Hi @colin-ho , could you please help take a look at this PR? Thanks a lot

Nov 19 '25 03:11 VOID001

Also, I wonder how should we add unit-test or tests for this kind of PR, it is an end-to-end with dashboard-server case. I don't see currently have any good framework inside repo to perform it

We do have tests in tests/test_subscribers.py you can add there to test that failed / cancelled events are captured by the MockSubscriber. However we do not have end to end tests with the dashboard yet.

Nov 20 '25 05:11 colin-ho

Hi @colin-ho, I've added tests and fix the comments. Could you please take another look?

Re: the dead state TODO mark. I will implement them in the next PR in order to fully close the issue #5518. Or do you want me to remove the "Dead" state in this PR and add it back in the next PR?

Nov 22 '25 10:11 VOID001

@greptile

Nov 22 '25 10:11 VOID001

Hi @colin-ho, I've added tests and fix the comments. Could you please take another look?

Re: the dead state TODO mark. I will implement them in the next PR in order to fully close the issue #5518. Or do you want me to remove the "Dead" state in this PR and add it back in the next PR?

No worries, feel free to do it in the next PR

Nov 24 '25 19:11 colin-ho