dolphinscheduler
dolphinscheduler copied to clipboard
[Bug] [Master] The subtask has actually been completed,but the parent workflow still wait for the subtask to complete all the time
Search before asking
- [X] I had searched in the issues and found no similar issues.
What happened
Occasionally, the workflow is stuck by subtasks all the time.
The subtask has actually been completed,but the parent workflow still wait for the subtask to complete all the time as you see in the image below.
In fact,the sub workflow has been completed ,but the interface data state found by the API is still being "RUNNING_EXECUTION".
What you expected to happen
When the subtask complete,the parent workflow was supposed to be over, instead of being stucked.
How to reproduce
I haven't found an accurate recurrence rule yet.
Anything else
No response
Version
3.0.0-beta-1
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Thank you for your feedback, we have received your issue, Please wait patiently for a reply.
- In order for us to understand your request as soon as possible, please provide detailed information、version or pictures.
- If you haven't received a reply for a long time, you can join our slack and send your question to channel
#troubleshooting
It also appeared in version 2.0.5. Some tasks of the subprocess are submitted at the same time and fail almost at the same time.
This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.
When executing python-c scripts,The node status displayed here is successful, but the time is inconsistent with the log, and the time is even earlier
@SbloodyS @davidzollo
cd '${file_path}'
python -c "import test1;test1.get_task_instances('${project_code}','','','')"
test1.py
`import sys,os import requests sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(file))))
def get_task_instances(project_code,process_Instance_Id="",task_name="",state_type=""):
token = 'ac87b6898942649eca6008addb153ac4'
addr = "127.0.0.1:12345"
process_instances_url='http://{addr}/dolphinscheduler/projects/{projectCode}/task-instances'
headers = {
"token": token,
"accept": "application/json; charset=utf-8"
}
params={
"projectCode":project_code,
"processInstanceId":process_Instance_Id,
"taskName":task_name,
"stateType":state_type,
"pageNo":1, "pageSize":1000000}
url = process_instances_url.format(addr=addr,projectCode=project_code)
res= requests.get(url,params=params,headers=headers)
result = res.json()
print('任务实例get_task_instance_by_id',len(result))
if result['code']==0 and result['data']:
return result['data']['totalList']
else:
raise ValueError('查询工作流实例失败')
`
sleep 10
echo '1111'
But this is a single machine, deployment mode is pseudo cluster
Is this a bug?
Your problem may be caused by an event handling exception that causes the event queue to block and the subprocess status check event to go unprocessed, which is what happened in mine, so far I've solved and suggested optimization to the community https://github.com/apache/dolphinscheduler/issues/11388#issue-1334131095
This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.
This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.