dolphinscheduler
dolphinscheduler copied to clipboard
task is waiting to excecuted for more than 12 hours and seems not to be overtimed
Search before asking
- [X] I had searched in the issues and found no similar issues.
What happened
it looks same as the bug 7441. the version is 3.1.8, and the cluster have 2 masters, 4 workers. I found a workflow instance running for more than 12 hours and it is abnormal. Then I found a task of the sub workflow is waiting to be executed for more than 10 hours. When I ended the workflow and restart it, the problem usually does not reproduce.
What you expected to happen
if a task waits for more than 5 minuts and can not be executed, the task should be failed.
How to reproduce
It is hard to reproduce, the workflow works normally for most of the time. And When I ended the workflow and restart it, the problem usually does not reproduce.
Anything else
No response
Version
3.1.x
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Please provide the master log related to the parent workflow and sub-workflow instances.
hi @epitomizelu #14986 maybe can fix you problem, It may have actually been executed, but it has been running because of the message backlog of the subprocess type task. cc @ruanwenjun WDYT?
hi @epitomizelu #14986 maybe can fix you problem, It may have actually been executed, but it has been running because of the message backlog of the subprocess type task. cc @ruanwenjun WDYT?
We may need to add metrics to record the size of xxCheckList in StateWheelExecuteThread.
hi @epitomizelu #14986 maybe can fix you problem, It may have actually been executed, but it has been running because of the message backlog of the subprocess type task. cc @ruanwenjun WDYT?
We may need to add metrics to record the size of
xxCheckListinStateWheelExecuteThread.
+1
@ruanwenjun 你好,在使用带有subprocess的任务时,主流程到达 subprocess一直处于等待状态,实际上子节点已完成,看了主任务状态还是运行中,下图是任务执行状态和集群部署情况,期待你的回复
@ruanwenjun 你好,在使用带有subprocess的任务时,主流程到达 subprocess一直处于等待状态,实际上子节点已完成,看了主任务状态还是运行中,下图是任务执行状态和集群部署情况,期待你的回复
![]()
看这个图,子工作流逻辑节点已经成功了?这个是哪个版本?如果是目前3.2.x应该没有这类问题了,在3.2.x子工作流节点采用拉的方式去查状态,可以避免之前由于推的方式推失败导致任务状态不更新的问题
@ruanwenjun 你好,3.2x中包含子工作流时,如果子工作流中有任务失败,点击从失败节点重跑,子工作流中成功的任务也会全部被拉起来,你们又遇到类似的问题吗
This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.
This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.
