feat: Allow dashboard to show query canceled/failed/dead information when query exited abnormally
Changes Made
This is a sub-pr for the issue #5518
- [x] Add QueryResult and PyQueryResult to show detailed state & error of a query
- [ ] Modify Subscriber interface to accept query end result
- [ ] Daft-Dashboard add server support for
/query/<id>/end(with frontend modification)
Related Issues
#5518
Checklist
- [ ] Documented in API Docs (if applicable)
- [ ] Documented in User Guide (if applicable)
- [ ] If adding a new documentation page, doc is added to
docs/mkdocs.ymlnavigation - [ ] Documentation builds and is formatted properly
Greptile Overview
Greptile Summary
This PR implements comprehensive query state tracking for canceled/failed/dead queries in the Daft Dashboard. The implementation adds QueryResult with end state and error message throughout the stack - from Python runner through Rust context to dashboard backend and frontend UI. The changes properly notify subscribers when queries end abnormally (keyboard interrupt, UDF exceptions, or general exceptions) and display appropriate status badges with error details in the dashboard.
Key Changes:
- Added
QueryEndStateenum (Finished/Canceled/Failed/Dead) andQueryResultstruct in Rust - Updated
Subscribertrait to acceptQueryResultinon_query_endcallback - Modified
native_runner.pyto catch exceptions and notify with appropriate end states - Enhanced dashboard backend to handle Canceled/Failed states with error messages
- Added UI components for Canceled/Failed states with tooltips showing error details
- Comprehensive tests validate all query end states
Issues Found:
-
todo!()panic inengine.rs:325for Dead state will crash dashboard if Dead state is received - Missing
duration_secfield in TypeScriptFailedStatusandCanceledStatustypes causes type mismatch with backend data
Confidence Score: 3/5
- Safe to merge after fixing critical Dead state panic and TypeScript type mismatches
- The implementation is mostly solid with proper state propagation through the stack and good test coverage. However, two logical issues prevent this from being merge-ready: (1) the
todo!()in engine.rs will panic if Dead state is received, potentially crashing the dashboard, and (2) TypeScript types are missing duration_sec fields that the backend provides, which could cause runtime type errors in the frontend - Pay close attention to
src/daft-dashboard/src/engine.rs(Dead state panic) andsrc/daft-dashboard/frontend/src/hooks/use-queries.ts(missing duration fields)
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| daft/runners/native_runner.py | 4/5 | Added comprehensive exception handling with try-except-else to notify subscribers of query end state (Finished/Canceled/Failed) |
| src/daft-context/src/python.rs | 5/5 | Added PyQueryResult type and updated notify_query_end to accept query result with end state and error message |
| src/daft-context/src/subscribers/mod.rs | 5/5 | Added QueryResult struct with end state and error message; updated Subscriber trait on_query_end signature |
| src/daft-dashboard/src/engine.rs | 3/5 | Added support for Canceled/Failed states in query_end endpoint; Dead state implementation is marked as todo!() |
| src/daft-dashboard/src/state.rs | 5/5 | Added Canceled and Failed variants to QueryState and QueryStatus enums with error messages and duration tracking |
| tests/test_subscribers.py | 5/5 | Added comprehensive tests for Finished, Failed, and Canceled query states with MockSubscriber validation |
Sequence Diagram
sequenceDiagram
participant User
participant NativeRunner
participant DaftContext
participant Subscriber
participant Dashboard
User->>NativeRunner: run_iter(builder)
NativeRunner->>DaftContext: notify_query_start(query_id, metadata)
DaftContext->>Subscriber: on_query_start(query_id, metadata)
Subscriber->>Dashboard: POST /query/{id}/start
NativeRunner->>DaftContext: notify_optimization_start(query_id)
DaftContext->>Subscriber: on_optimization_start(query_id)
Subscriber->>Dashboard: POST /query/{id}/plan_start
NativeRunner->>NativeRunner: builder.optimize()
NativeRunner->>DaftContext: notify_optimization_end(query_id, plan)
DaftContext->>Subscriber: on_optimization_end(query_id, plan)
Subscriber->>Dashboard: POST /query/{id}/plan_end
NativeRunner->>NativeRunner: execute plan
alt Success Path
loop For each result
NativeRunner->>DaftContext: notify_result_out(query_id, result)
DaftContext->>Subscriber: on_result_out(query_id, result)
NativeRunner-->>User: yield result
end
NativeRunner->>DaftContext: notify_query_end(query_id, Finished)
DaftContext->>Subscriber: on_query_end(query_id, Finished)
Subscriber->>Dashboard: POST /query/{id}/end (state=Finished)
else KeyboardInterrupt
NativeRunner->>DaftContext: notify_query_end(query_id, Canceled)
DaftContext->>Subscriber: on_query_end(query_id, Canceled)
Subscriber->>Dashboard: POST /query/{id}/end (state=Canceled)
NativeRunner-->>User: raise KeyboardInterrupt
else UDFException
NativeRunner->>DaftContext: notify_query_end(query_id, Failed)
DaftContext->>Subscriber: on_query_end(query_id, Failed)
Subscriber->>Dashboard: POST /query/{id}/end (state=Failed)
NativeRunner-->>User: raise UDFException
else Other Exception
NativeRunner->>DaftContext: notify_query_end(query_id, Failed)
DaftContext->>Subscriber: on_query_end(query_id, Failed)
Subscriber->>Dashboard: POST /query/{id}/end (state=Failed)
NativeRunner-->>User: raise Exception
end
Dashboard->>Dashboard: Update QueryState (Finished/Canceled/Failed)
Dashboard->>User: Display final status in UI
@greptile
Hi @srilman, this PR is ready for review, I also mentioned the impl in slack channel. You may take a look when you have time
I will finish the "liveness check" part in next PR
Also, I wonder how should we add unit-test or tests for this kind of PR, it is an end-to-end with dashboard-server case. I don't see currently have any good framework inside repo to perform it
Codecov Report
:x: Patch coverage is 56.31068% with 45 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 70.37%. Comparing base (01b06e3) to head (6a83a8a).
:warning: Report is 4 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #5576 +/- ##
==========================================
- Coverage 70.38% 70.37% -0.02%
==========================================
Files 1021 1021
Lines 131276 131360 +84
==========================================
+ Hits 92403 92448 +45
- Misses 38873 38912 +39
| Files with missing lines | Coverage Δ | |
|---|---|---|
| daft/context.py | 87.64% <100.00%> (ø) |
|
| daft/runners/native_runner.py | 80.00% <100.00%> (+4.63%) |
:arrow_up: |
| daft/subscribers/abc.py | 72.22% <100.00%> (ø) |
|
| src/daft-context/src/lib.rs | 95.31% <100.00%> (+0.03%) |
:arrow_up: |
| src/daft-context/src/subscribers/mod.rs | 44.44% <ø> (ø) |
|
| src/common/metrics/src/lib.rs | 39.39% <50.00%> (+0.68%) |
:arrow_up: |
| src/daft-context/src/subscribers/debug.rs | 0.00% <0.00%> (ø) |
|
| src/daft-context/src/subscribers/python.rs | 92.10% <83.33%> (-0.69%) |
:arrow_down: |
| src/daft-context/src/subscribers/dashboard.rs | 2.19% <0.00%> (-0.03%) |
:arrow_down: |
| src/daft-context/src/python.rs | 82.19% <80.00%> (-1.00%) |
:arrow_down: |
| ... and 2 more |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
Hi @colin-ho , could you please help take a look at this PR? Thanks a lot
Also, I wonder how should we add unit-test or tests for this kind of PR, it is an end-to-end with dashboard-server case. I don't see currently have any good framework inside repo to perform it
We do have tests in tests/test_subscribers.py you can add there to test that failed / cancelled events are captured by the MockSubscriber. However we do not have end to end tests with the dashboard yet.
Hi @colin-ho, I've added tests and fix the comments. Could you please take another look?
Re: the dead state TODO mark. I will implement them in the next PR in order to fully close the issue #5518. Or do you want me to remove the "Dead" state in this PR and add it back in the next PR?
@greptile
Hi @colin-ho, I've added tests and fix the comments. Could you please take another look?
Re: the dead state TODO mark. I will implement them in the next PR in order to fully close the issue #5518. Or do you want me to remove the "Dead" state in this PR and add it back in the next PR?
No worries, feel free to do it in the next PR