dolphinscheduler
dolphinscheduler copied to clipboard
[Bug] [Module Name] Bug title Handle task result event error, the task instance is already finished, will discord this event
Search before asking
- [x] I had searched in the issues and found no similar issues.
What happened
MASTER SERVER LOG
[ERROR] 2025-02-14 09:01:43.302 +0300 org.apache.dolphinscheduler.server.master.processor.queue.TaskExecuteRunnable:[65] - [WorkflowInstance-0][TaskInstance-0] - Handle task event error, this event will be removed, event: TaskEvent(taskInstanceId=69938, workerAddress=xx.xx.xx.xx:1234, state=TaskExecutionStatus{code=7, desc='success'}, startTime=Fri Feb 14 09:00:45 GMT+03:00 2025, endTime=Fri Feb 14 09:01:42 GMT+03:00 2025, executePath=/data/dolphin/exec/process/default/13433961245760/13433966255552_34/7331/69938, logPath=/data/dolphin/worker-server/logs/20250214/13433966255552/34/7331/69938.log, processId=0, appIds=null, event=RESULT, varPool=[{"prop":"param_etl_date","direct":"IN","type":"VARCHAR","value":"20250215"}], cacheTaskInstanceId=0, channel=[id: 0x68b20e17, L:/xx.xx.xx.xx:5678 - R:/xx.xx.xx.xx:56156], processInstanceId=7331)
org.apache.dolphinscheduler.server.master.event.TaskEventHandleError: Handle task result event error, the task instance is already finished, will discord this event
at org.apache.dolphinscheduler.server.master.event.TaskResultEventHandler.handleTaskEvent(TaskResultEventHandler.java:83)
at org.apache.dolphinscheduler.server.master.processor.queue.TaskExecuteRunnable.run(TaskExecuteRunnable.java:57)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
[INFO] 2025-02-14 09:01:43.302 +0300 org.apache.dolphinscheduler.server.master.processor.queue.TaskExecuteThreadPool:[126] - [WorkflowInstance-0][TaskInstance-0] - [WorkflowInstance-7331] persist events succeeded
[INFO] 2025-02-14 09:01:44.743 +0300 org.apache.dolphinscheduler.server.master.processor.TaskExecuteResultProcessor:[63] - [WorkflowInstance-7331][TaskInstance-69940] - Received task execute result, event: TaskEvent(taskInstanceId=69940, workerAddress=xx.xx.xx.xx:1234, state=TaskExecutionStatus{code=7, desc='success'}, startTime=Fri Feb 14 09:00:45 GMT+03:00 2025, endTime=Fri Feb 14 09:01:44 GMT+03:00 2025, executePath=/data/dolphin/exec/process/default/13433961245760/13433966255552_34/7331/69940, logPath=/data/dolphin/worker-server/logs/20250214/13433966255552/34/7331/69940.log, processId=0, appIds=null, event=RESULT, varPool=[{"prop":"param_etl_date","direct":"IN","type":"VARCHAR","value":"20250215"}], cacheTaskInstanceId=0, channel=[id: 0x68b20e17, L:/xx.xx.xx.xx:5678 - R:/xx.xx.xx.xx:56156], processInstanceId=7331)
[INFO] 2025-02-14 09:01:45.302 +0300 org.apache.dolphinscheduler.server.master.processor.queue.TaskExecuteRunnable:[56] - [WorkflowInstance-7331][TaskInstance-69940] - Handle task event begin: TaskEvent(taskInstanceId=69940, workerAddress=xx.xx.xx.xx:1234, state=TaskExecutionStatus{code=7, desc='success'}, startTime=Fri Feb 14 09:00:45 GMT+03:00 2025, endTime=Fri Feb 14 09:01:44 GMT+03:00 2025, executePath=/data/dolphin/exec/process/default/13433961245760/13433966255552_34/7331/69940, logPath=/data/dolphin/worker-server/logs/20250214/13433966255552/34/7331/69940.log, processId=0, appIds=null, event=RESULT, varPool=[{"prop":"param_etl_date","direct":"IN","type":"VARCHAR","value":"20250215"}], cacheTaskInstanceId=0, channel=[id: 0x68b20e17, L:/xx.xx.xx.xx:5678 - R:/xx.xx.xx.xx:56156], processInstanceId=7331)
[ERROR] 2025-02-14 09:01:45.303 +0300 org.apache.dolphinscheduler.server.master.processor.queue.TaskExecuteRunnable:[65] - [WorkflowInstance-0][TaskInstance-0] - Handle task event error, this event will be removed, event: TaskEvent(taskInstanceId=69940, workerAddress=xx.xx.xx.xx:1234, state=TaskExecutionStatus{code=7, desc='success'}, startTime=Fri Feb 14 09:00:45 GMT+03:00 2025, endTime=Fri Feb 14 09:01:44 GMT+03:00 2025, executePath=/data/dolphin/exec/process/default/13433961245760/13433966255552_34/7331/69940, logPath=/data/dolphin/worker-server/logs/20250214/13433966255552/34/7331/69940.log, processId=0, appIds=null, event=RESULT, varPool=[{"prop":"param_etl_date","direct":"IN","type":"VARCHAR","value":"20250215"}], cacheTaskInstanceId=0, channel=[id: 0x68b20e17, L:/xx.xx.xx.xx:5678 - R:/xx.xx.xx.xx:56156], processInstanceId=7331)
org.apache.dolphinscheduler.server.master.event.TaskEventHandleError: Handle task result event error, the task instance is already finished, will discord this event
at org.apache.dolphinscheduler.server.master.event.TaskResultEventHandler.handleTaskEvent(TaskResultEventHandler.java:83)
at org.apache.dolphinscheduler.server.master.processor.queue.TaskExecuteRunnable.run(TaskExecuteRunnable.java:57)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
[INFO] 2025-02-14 09:01:45.303 +0300 org.apache.dolphinscheduler.server.master.processor.queue.TaskExecuteThreadPool:[126] - [WorkflowInstance-0][TaskInstance-0] - [WorkflowInstance-7331] persist events succeeded
[INFO] 2025-02-14 09:01:54.023 +0300 org.apache.dolphinscheduler.server.master.registry.MasterRegistryDataListener:[73] - [WorkflowInstance-0][TaskInstance-0] - worker node added : /nodes/worker/xx.xx.xx.xx:1234
Task Instance Detail
********************************************************************************
Task Name: ORG_GOP_WHITE_LIST
Workflow Instance Name: WF_Organizasyon-34-20250214090000292
Task Execute Type: batch
Execute State: need fault tolerance
Host: xx.xx.xx.xx:1234
Task Type: PROCEDURE
Priority: medium
Tenant: default
First Submit Time: Fri Feb 14 09:00:45 GMT+03:00 2025
Submit Time: Fri Feb 14 09:00:45 GMT+03:00 2025
Start Time: Fri Feb 14 09:00:45 GMT+03:00 2025
End Time: null
Task Instance Detail
********************************************************************************
Task Name: ORG_GIP_WHITE_LIST
Workflow Instance Name: WF_Organizasyon-34-20250214090000292
Task Execute Type: batch
Execute State: need fault tolerance
Host: xx.xx.xx.xx:1234
Task Type: PROCEDURE
Priority: medium
Tenant: default
First Submit Time: Fri Feb 14 09:00:45 GMT+03:00 2025
Submit Time: Fri Feb 14 09:00:45 GMT+03:00 2025
Start Time: Fri Feb 14 09:00:45 GMT+03:00 2025
End Time: null
Task Instance Detail
********************************************************************************
Task Name: ORG_IA_WHITE_LIST
Workflow Instance Name: WF_Organizasyon-34-20250214090000292
Task Execute Type: batch
Execute State: need fault tolerance
Host: xx.xx.xx.xx:1234
Task Type: PROCEDURE
Priority: medium
Tenant: default
First Submit Time: Fri Feb 14 09:00:45 GMT+03:00 2025
Submit Time: Fri Feb 14 09:00:45 GMT+03:00 2025
Start Time: Fri Feb 14 09:00:45 GMT+03:00 2025
End Time: null
--WORKER SERVER LOG--
[WARN] 2025-02-14 09:01:56.608 +0300 org.apache.dolphinscheduler.remote.NettyRemotingClient:[321] - [WorkflowInstance-7331][TaskInstance-69942] - connect to Host(ip=xx.xx.xx.xx, port=5678) error
java.lang.IllegalStateException: executor not accepting a task
at io.netty.resolver.AddressResolverGroup.getResolver(AddressResolverGroup.java:61)
at io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:194)
at io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:46)
at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:180)
at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:166)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
at io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:604)
at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84)
at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetSuccess(AbstractChannel.java:989)
at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:504)
at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:417)
at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:474)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:387)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.base/java.lang.Thread.run(Thread.java:829)
[ERROR] 2025-02-14 09:01:56.609 +0300 org.apache.dolphinscheduler.server.worker.rpc.WorkerMessageSender:[69] - [WorkflowInstance-7331][TaskInstance-69942] - Send message error, messageType: TASK_EXECUTE_RESULT_MESSAGE, message: TaskExecuteResultMessage(super=BaseMessage(messageSenderAddress=xx.xx.xx.xx:1234, messageReceiverAddress=xx.xx.xx.xx:5678, messageSendTime=1739512915680), taskInstanceId=69942, processInstanceId=7331, status=7, startTime=1739512876838, host=xx.xx.xx.xx:1234, logPath=/data/dolphin/worker-server/logs/20250214/13433966255552/34/7331/69942.log, executePath=/data/dolphin/exec/process/default/13433961245760/13433966255552_34/7331/69942, endTime=1739512915680, processId=0, appIds=null, varPool=[{"prop":"param_etl_date","direct":"IN","type":"VARCHAR","value":"20250215"}])
What you expected to happen
Task instance id of parallel task instance will mix up when reporting task status
How to reproduce
Shared all workflow definitions
Anything else
No response
Version
3.2.x
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [x] I agree to follow this project's Code of Conduct
version is 3.2.0
Is there any update about the issue?
Seems caused by worker failover? since this whole logic has already refactored, please test on dev branch.
Thank you for response. On a dev branch ? What about version 3.2.2
Thank you for response. On a dev branch ? What about version 3.2.2
In fact, no one maintain 3.2.2 😢
What about 3.3.2 ?
What about 3.1.9?
What about 3.3.2 ?
3.3.2 is a suggested version.
What about 3.1.9?
3.1.9 will not maintain.