dolphinscheduler icon indicating copy to clipboard operation
dolphinscheduler copied to clipboard

[Bug] [Module Name] Bug title Handle task result event error, the task instance is already finished, will discord this event

Open LeBakii opened this issue 9 months ago • 5 comments

Search before asking

  • [x] I had searched in the issues and found no similar issues.

What happened

MASTER SERVER LOG

[ERROR] 2025-02-14 09:01:43.302 +0300 org.apache.dolphinscheduler.server.master.processor.queue.TaskExecuteRunnable:[65] - [WorkflowInstance-0][TaskInstance-0] - Handle task event error, this event will be removed, event: TaskEvent(taskInstanceId=69938, workerAddress=xx.xx.xx.xx:1234, state=TaskExecutionStatus{code=7, desc='success'}, startTime=Fri Feb 14 09:00:45 GMT+03:00 2025, endTime=Fri Feb 14 09:01:42 GMT+03:00 2025, executePath=/data/dolphin/exec/process/default/13433961245760/13433966255552_34/7331/69938, logPath=/data/dolphin/worker-server/logs/20250214/13433966255552/34/7331/69938.log, processId=0, appIds=null, event=RESULT, varPool=[{"prop":"param_etl_date","direct":"IN","type":"VARCHAR","value":"20250215"}], cacheTaskInstanceId=0, channel=[id: 0x68b20e17, L:/xx.xx.xx.xx:5678 - R:/xx.xx.xx.xx:56156], processInstanceId=7331)
org.apache.dolphinscheduler.server.master.event.TaskEventHandleError: Handle task result event error, the task instance is already finished, will discord this event
        at org.apache.dolphinscheduler.server.master.event.TaskResultEventHandler.handleTaskEvent(TaskResultEventHandler.java:83)
        at org.apache.dolphinscheduler.server.master.processor.queue.TaskExecuteRunnable.run(TaskExecuteRunnable.java:57)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
[INFO] 2025-02-14 09:01:43.302 +0300 org.apache.dolphinscheduler.server.master.processor.queue.TaskExecuteThreadPool:[126] - [WorkflowInstance-0][TaskInstance-0] - [WorkflowInstance-7331] persist events succeeded
[INFO] 2025-02-14 09:01:44.743 +0300 org.apache.dolphinscheduler.server.master.processor.TaskExecuteResultProcessor:[63] - [WorkflowInstance-7331][TaskInstance-69940] - Received task execute result, event: TaskEvent(taskInstanceId=69940, workerAddress=xx.xx.xx.xx:1234, state=TaskExecutionStatus{code=7, desc='success'}, startTime=Fri Feb 14 09:00:45 GMT+03:00 2025, endTime=Fri Feb 14 09:01:44 GMT+03:00 2025, executePath=/data/dolphin/exec/process/default/13433961245760/13433966255552_34/7331/69940, logPath=/data/dolphin/worker-server/logs/20250214/13433966255552/34/7331/69940.log, processId=0, appIds=null, event=RESULT, varPool=[{"prop":"param_etl_date","direct":"IN","type":"VARCHAR","value":"20250215"}], cacheTaskInstanceId=0, channel=[id: 0x68b20e17, L:/xx.xx.xx.xx:5678 - R:/xx.xx.xx.xx:56156], processInstanceId=7331)
[INFO] 2025-02-14 09:01:45.302 +0300 org.apache.dolphinscheduler.server.master.processor.queue.TaskExecuteRunnable:[56] - [WorkflowInstance-7331][TaskInstance-69940] - Handle task event begin: TaskEvent(taskInstanceId=69940, workerAddress=xx.xx.xx.xx:1234, state=TaskExecutionStatus{code=7, desc='success'}, startTime=Fri Feb 14 09:00:45 GMT+03:00 2025, endTime=Fri Feb 14 09:01:44 GMT+03:00 2025, executePath=/data/dolphin/exec/process/default/13433961245760/13433966255552_34/7331/69940, logPath=/data/dolphin/worker-server/logs/20250214/13433966255552/34/7331/69940.log, processId=0, appIds=null, event=RESULT, varPool=[{"prop":"param_etl_date","direct":"IN","type":"VARCHAR","value":"20250215"}], cacheTaskInstanceId=0, channel=[id: 0x68b20e17, L:/xx.xx.xx.xx:5678 - R:/xx.xx.xx.xx:56156], processInstanceId=7331)
[ERROR] 2025-02-14 09:01:45.303 +0300 org.apache.dolphinscheduler.server.master.processor.queue.TaskExecuteRunnable:[65] - [WorkflowInstance-0][TaskInstance-0] - Handle task event error, this event will be removed, event: TaskEvent(taskInstanceId=69940, workerAddress=xx.xx.xx.xx:1234, state=TaskExecutionStatus{code=7, desc='success'}, startTime=Fri Feb 14 09:00:45 GMT+03:00 2025, endTime=Fri Feb 14 09:01:44 GMT+03:00 2025, executePath=/data/dolphin/exec/process/default/13433961245760/13433966255552_34/7331/69940, logPath=/data/dolphin/worker-server/logs/20250214/13433966255552/34/7331/69940.log, processId=0, appIds=null, event=RESULT, varPool=[{"prop":"param_etl_date","direct":"IN","type":"VARCHAR","value":"20250215"}], cacheTaskInstanceId=0, channel=[id: 0x68b20e17, L:/xx.xx.xx.xx:5678 - R:/xx.xx.xx.xx:56156], processInstanceId=7331)
org.apache.dolphinscheduler.server.master.event.TaskEventHandleError: Handle task result event error, the task instance is already finished, will discord this event
        at org.apache.dolphinscheduler.server.master.event.TaskResultEventHandler.handleTaskEvent(TaskResultEventHandler.java:83)
        at org.apache.dolphinscheduler.server.master.processor.queue.TaskExecuteRunnable.run(TaskExecuteRunnable.java:57)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
[INFO] 2025-02-14 09:01:45.303 +0300 org.apache.dolphinscheduler.server.master.processor.queue.TaskExecuteThreadPool:[126] - [WorkflowInstance-0][TaskInstance-0] - [WorkflowInstance-7331] persist events succeeded
[INFO] 2025-02-14 09:01:54.023 +0300 org.apache.dolphinscheduler.server.master.registry.MasterRegistryDataListener:[73] - [WorkflowInstance-0][TaskInstance-0] - worker node added : /nodes/worker/xx.xx.xx.xx:1234

Image Image

                    Task Instance Detail
********************************************************************************
Task Name:              ORG_GOP_WHITE_LIST
Workflow Instance Name: WF_Organizasyon-34-20250214090000292
Task Execute Type:      batch
Execute State:          need fault tolerance
Host:                   xx.xx.xx.xx:1234
Task Type:              PROCEDURE
Priority:               medium
Tenant:                 default
First Submit Time:      Fri Feb 14 09:00:45 GMT+03:00 2025
Submit Time:            Fri Feb 14 09:00:45 GMT+03:00 2025
Start Time:             Fri Feb 14 09:00:45 GMT+03:00 2025
End Time:               null
                    Task Instance Detail
********************************************************************************
Task Name:              ORG_GIP_WHITE_LIST
Workflow Instance Name: WF_Organizasyon-34-20250214090000292
Task Execute Type:      batch
Execute State:          need fault tolerance
Host:                   xx.xx.xx.xx:1234
Task Type:              PROCEDURE
Priority:               medium
Tenant:                 default
First Submit Time:      Fri Feb 14 09:00:45 GMT+03:00 2025
Submit Time:            Fri Feb 14 09:00:45 GMT+03:00 2025
Start Time:             Fri Feb 14 09:00:45 GMT+03:00 2025
End Time:               null
                    Task Instance Detail
********************************************************************************
Task Name:              ORG_IA_WHITE_LIST
Workflow Instance Name: WF_Organizasyon-34-20250214090000292
Task Execute Type:      batch
Execute State:          need fault tolerance
Host:                   xx.xx.xx.xx:1234
Task Type:              PROCEDURE
Priority:               medium
Tenant:                 default
First Submit Time:      Fri Feb 14 09:00:45 GMT+03:00 2025
Submit Time:            Fri Feb 14 09:00:45 GMT+03:00 2025
Start Time:             Fri Feb 14 09:00:45 GMT+03:00 2025
End Time:               null

--WORKER SERVER LOG--

[WARN] 2025-02-14 09:01:56.608 +0300 org.apache.dolphinscheduler.remote.NettyRemotingClient:[321] - [WorkflowInstance-7331][TaskInstance-69942] - connect to Host(ip=xx.xx.xx.xx, port=5678) error
java.lang.IllegalStateException: executor not accepting a task
        at io.netty.resolver.AddressResolverGroup.getResolver(AddressResolverGroup.java:61)
        at io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:194)
        at io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:46)
        at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:180)
        at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:166)
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551)
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
        at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
        at io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:604)
        at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
        at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetSuccess(AbstractChannel.java:989)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:504)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:417)
        at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:474)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:387)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at java.base/java.lang.Thread.run(Thread.java:829)
[ERROR] 2025-02-14 09:01:56.609 +0300 org.apache.dolphinscheduler.server.worker.rpc.WorkerMessageSender:[69] - [WorkflowInstance-7331][TaskInstance-69942] - Send message error, messageType: TASK_EXECUTE_RESULT_MESSAGE, message: TaskExecuteResultMessage(super=BaseMessage(messageSenderAddress=xx.xx.xx.xx:1234, messageReceiverAddress=xx.xx.xx.xx:5678, messageSendTime=1739512915680), taskInstanceId=69942, processInstanceId=7331, status=7, startTime=1739512876838, host=xx.xx.xx.xx:1234, logPath=/data/dolphin/worker-server/logs/20250214/13433966255552/34/7331/69942.log, executePath=/data/dolphin/exec/process/default/13433961245760/13433966255552_34/7331/69942, endTime=1739512915680, processId=0, appIds=null, varPool=[{"prop":"param_etl_date","direct":"IN","type":"VARCHAR","value":"20250215"}])

What you expected to happen

Task instance id of parallel task instance will mix up when reporting task status

How to reproduce

Shared all workflow definitions

Anything else

No response

Version

3.2.x

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

LeBakii avatar Feb 14 '25 12:02 LeBakii

version is 3.2.0

LeBakii avatar Feb 14 '25 12:02 LeBakii

Is there any update about the issue?

LeBakii avatar Feb 24 '25 11:02 LeBakii

Seems caused by worker failover? since this whole logic has already refactored, please test on dev branch.

ruanwenjun avatar Feb 24 '25 14:02 ruanwenjun

Thank you for response. On a dev branch ? What about version 3.2.2

LeBakii avatar Mar 06 '25 06:03 LeBakii

Thank you for response. On a dev branch ? What about version 3.2.2

In fact, no one maintain 3.2.2 😢

ruanwenjun avatar Mar 07 '25 01:03 ruanwenjun

What about 3.3.2 ?

XceptN avatar Nov 06 '25 08:11 XceptN

What about 3.1.9?

BilgeKaanGencdogan avatar Nov 06 '25 11:11 BilgeKaanGencdogan

What about 3.3.2 ?

3.3.2 is a suggested version.

ruanwenjun avatar Nov 07 '25 09:11 ruanwenjun

What about 3.1.9?

3.1.9 will not maintain.

ruanwenjun avatar Nov 07 '25 09:11 ruanwenjun