[BUG] Inconsistency in Flyte console UI
Describe the bug
We have a workflow that has failed, but some sub-workflows are stuck in Running status for 400+ hr. The child Python Tasks have status UNKNOWN.
The phase column in flyteadmindb.task_executions for this task (sub-workflow) shows SUCCEEDED, workflow status is FAILED, but the flyte console UI shows that this task (sub-workflow) is still running.
We were on flyte 1.11 (same flytekit) when this workflow was triggered , but we upgraded flyte backend to 1.15 (1 week after the workflow start time). Could this be a bug in flyte console? Is there anything we can do to fix this?
Expected behavior
Flyte console should show the same status as flyteadmindb.task_executions
Additional context to reproduce
No response
Screenshots
Are you sure this issue hasn't been raised already?
- [x] Yes
Have you read the Code of Conduct?
- [x] Yes
Thank you for opening your first issue here! 🛠
This is a known issue: Flyte Console can display subworkflows or tasks as "Running" or "Unknown" even when the database and backend report them as "Succeeded" or "Failed." This UI inconsistency has been reported in several issues, especially when the parent workflow is in a terminal state but child nodes or subworkflows remain "Running" in the UI, or when the sidebar/task panel does not refresh automatically (source, source, source).
Workarounds include refreshing the page, deselecting/reselecting the node, or upgrading to the latest Flyte Console version, as some UI refresh bugs have been fixed in recent releases (source, source). However, the underlying eventing and state propagation issues between FlytePropeller, FlyteAdmin, and the UI are still being improved. Would you like more detail on the technical causes or possible backend mitigations?
Sources:
- Flyte Issue #3931
- Flyte Issue #505
- Flyte v1.9.0 Changelog
- https://github.com/flyteorg/flyte/issues/3822
This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.
Mention @runllm to ask follow-up questions. See runllm.com for more!
@ababbar-stripe ever since you upgraded the backend to 1.15, have you seen this behavior again?
@ababbar-stripe ever since you upgraded the backend to 1.15, have you seen this behavior again?
No, we didn't see this again. Is this a known issue in flyte 1.11 ?
"Hello 👋, this issue has been inactive for over 90 days. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 7 days. Thank you for your contribution and understanding! 🙏"
Hello 👋, this issue has been inactive for over 90 days and hasn't received any updates since it was marked as stale. We'll be closing this issue for now, but if you believe this issue is still relevant, please feel free to reopen it. Thank you for your contribution and understanding! 🙏
cc @davidmirror-ops
As a follow-up, this issue doesn't get resolved after moving to 1.15. We encountered this issue again last week. Again, the phase column in flyteadmindb.task_executions for this sub-workflow shows SUCCEEDED while actual workflow status is FAILED, and the flyte console UI shows RUNNING.
We are currently on flytekit 1.15.3.
@ababbar-stripe is the n2 node Running because it's retrying?
#take