flyte icon indicating copy to clipboard operation
flyte copied to clipboard

[BUG] Inconsistency in Flyte console UI

Open ababbar-stripe opened this issue 8 months ago • 9 comments

Describe the bug

We have a workflow that has failed, but some sub-workflows are stuck in Running status for 400+ hr. The child Python Tasks have status UNKNOWN.

The phase column in flyteadmindb.task_executions for this task (sub-workflow) shows SUCCEEDED, workflow status is FAILED, but the flyte console UI shows that this task (sub-workflow) is still running.

We were on flyte 1.11 (same flytekit) when this workflow was triggered , but we upgraded flyte backend to 1.15 (1 week after the workflow start time). Could this be a bug in flyte console? Is there anything we can do to fix this?

Expected behavior

Flyte console should show the same status as flyteadmindb.task_executions

Additional context to reproduce

No response

Screenshots

Image

Are you sure this issue hasn't been raised already?

  • [x] Yes

Have you read the Code of Conduct?

  • [x] Yes

ababbar-stripe avatar May 06 '25 23:05 ababbar-stripe

Thank you for opening your first issue here! 🛠

welcome[bot] avatar May 06 '25 23:05 welcome[bot]

This is a known issue: Flyte Console can display subworkflows or tasks as "Running" or "Unknown" even when the database and backend report them as "Succeeded" or "Failed." This UI inconsistency has been reported in several issues, especially when the parent workflow is in a terminal state but child nodes or subworkflows remain "Running" in the UI, or when the sidebar/task panel does not refresh automatically (source, source, source).

Workarounds include refreshing the page, deselecting/reselecting the node, or upgrading to the latest Flyte Console version, as some UI refresh bugs have been fixed in recent releases (source, source). However, the underlying eventing and state propagation issues between FlytePropeller, FlyteAdmin, and the UI are still being improved. Would you like more detail on the technical causes or possible backend mitigations?

Sources:


This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.

Mention @runllm to ask follow-up questions. See runllm.com for more!

runllm[bot] avatar May 06 '25 23:05 runllm[bot]

@ababbar-stripe ever since you upgraded the backend to 1.15, have you seen this behavior again?

davidmirror-ops avatar May 13 '25 17:05 davidmirror-ops

@ababbar-stripe ever since you upgraded the backend to 1.15, have you seen this behavior again?

No, we didn't see this again. Is this a known issue in flyte 1.11 ?

ababbar-stripe avatar May 13 '25 21:05 ababbar-stripe

"Hello 👋, this issue has been inactive for over 90 days. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 7 days. Thank you for your contribution and understanding! 🙏"

github-actions[bot] avatar Aug 12 '25 00:08 github-actions[bot]

Hello 👋, this issue has been inactive for over 90 days and hasn't received any updates since it was marked as stale. We'll be closing this issue for now, but if you believe this issue is still relevant, please feel free to reopen it. Thank you for your contribution and understanding! 🙏

github-actions[bot] avatar Aug 20 '25 00:08 github-actions[bot]

cc @davidmirror-ops As a follow-up, this issue doesn't get resolved after moving to 1.15. We encountered this issue again last week. Again, the phase column in flyteadmindb.task_executions for this sub-workflow shows SUCCEEDED while actual workflow status is FAILED, and the flyte console UI shows RUNNING.

We are currently on flytekit 1.15.3.

Image

ta-stripe avatar Oct 07 '25 17:10 ta-stripe

@ababbar-stripe is the n2 node Running because it's retrying?

davidmirror-ops avatar Oct 30 '25 14:10 davidmirror-ops

#take

yuhuan130 avatar Nov 25 '25 10:11 yuhuan130