dolphinscheduler [Bug] [Master-server 3.2.1] Host of task instance is null appears in version 3.2.1

Search before asking

[X] I had searched in the issues and found no similar issues.

What happened

3.1.9版本升级到3.2.1后，发现任务实例一天会出现5 6条调度失败的任务，通过排查，发现调度任务时，task_instance表的host都是null, 如下图： bug12 bug13 bug14

bug11

打印日志如下：

[INFO] 2024-02-22 03:00:16.737 -0500 o.a.d.a.s.i.SchedulerServiceImpl:[809] - Schedule update complete, projectCode:12669712879296, processDefinitionCode:12670706423502, scheduleId:2. [INFO] 2024-02-22 03:00:18.898 -0500 o.a.d.s.q.QuartzScheduler:[104] - Add job, job name: job_2, group name: jobgroup_1 [INFO] 2024-02-22 03:00:18.908 -0500 o.a.d.s.q.QuartzScheduler:[137] - schedule job trigger, triggerName: job_2, triggerGroupName: jobgroup_1, cronExpression: 30 0/1 * * * ? *, startDate: Thu Feb 22 03:00:18 EST 2024, endDate: Fri Jan 30 02:00:00 EST 2224 [ERROR] 2024-02-22 03:00:42.551 -0500 o.a.d.a.s.i.LoggerServiceImpl:[96] - Host of task instance is null, taskInstanceId:120691. [ERROR] 2024-02-22 03:01:22.126 -0500 o.a.d.a.s.i.LoggerServiceImpl:[96] - Host of task instance is null, taskInstanceId:120693. [ERROR] 2024-02-22 03:01:53.114 -0500 o.a.d.a.s.i.LoggerServiceImpl:[96] - Host of task instance is null, taskInstanceId:120691. [ERROR] 2024-02-22 03:02:22.820 -0500 o.a.d.a.s.i.LoggerServiceImpl:[96] - Host of task instance is null, taskInstanceId:120693. [ERROR] 2024-02-22 03:03:08.621 -0500 o.a.d.a.s.i.LoggerServiceImpl:[96] - Host of task instance is null, taskInstanceId:120693.

What you expected to happen

不希望每天出现几条调度失败任务

How to reproduce

跑了3天的3.2.1版本任务，每天都会出现，通过数据库task_instance表，有空的host就是这个问题

Anything else

No response

Version

3.2.x

Are you willing to submit PR?

[X] Yes I am willing to submit a PR!

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Feb 22 '24 08:02 liubo988

Search before asking

[X] I had searched in the issues and found no similar issues.

What happened

After upgrading from version 3.1.9 to 3.2.1, it was found that 5 or 6 tasks that failed to be scheduled appeared in the task instance a day. Through investigation, it was found that when scheduling tasks, the host of the task_instance table was null, as shown below: bug12 bug13 bug14

bug11

The print log is as follows:

[INFO] 2024-02-22 03:00:16.737 -0500 o.a.d.a.s.i.SchedulerServiceImpl:[809] - Schedule update complete, projectCode:12669712879296, processDefinitionCode:12670706423502, scheduleId:2. [INFO] 2024-02-22 03:00:18.898 -0500 o.a.d.s.q.QuartzScheduler:[104] - Add job, job name: job_2, group name: jobgroup_1 [INFO] 2024-02-22 03:00:18.908 -0500 o.a.d.s.q.QuartzScheduler:[137] - schedule job trigger, triggerName: job_2, triggerGroupName: jobgroup_1, cronExpression: 30 0/1 * * * ? *, startDate: Thu Feb 22 03:00:18 EST 2024, endDate: Fri Jan 30 02:00:00 EST 2224 [ERROR] 2024-02-22 03:00:42.551 -0500 o.a.d.a.s.i.LoggerServiceImpl:[96] - Host of task instance is null, taskInstanceId:120691. [ERROR] 2024-02-22 03:01:22.126 -0500 o.a.d.a.s.i.LoggerServiceImpl:[96] - Host of task instance is null, taskInstanceId:120693. [ERROR] 2024-02-22 03:01:53.114 -0500 o.a.d.a.s.i.LoggerServiceImpl:[96] - Host of task instance is null, taskInstanceId:120691. [ERROR] 2024-02-22 03:02:22.820 -0500 o.a.d.a.s.i.LoggerServiceImpl:[96] - Host of task instance is null, taskInstanceId:120693. [ERROR] 2024-02-22 03:03:08.621 -0500 o.a.d.a.s.i.LoggerServiceImpl:[96] - Host of task instance is null, taskInstanceId:120693.

What you expected to happen

I don’t want to have several failed scheduling tasks every day.

How to reproduce

The 3.2.1 version task that has been running for 3 days appears every day. Through the database task_instance table, this is the problem with the available hosts.

Anything else

No response

Version

3.2.x

Are you willing to submit PR?

[X] Yes I am willing to submit a PR!

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Feb 22 '24 08:02 github-actions[bot]

Uploading bug14.png…

Feb 22 '24 08:02 liubo988

+1,3.1.9升级到3.2.0出现这个情况，升级到3.2.1也出现这个情况，而且我的情况更多，希望能帮忙看看，为什么这么多task instance host is null !!!!

Feb 23 '24 06:02 z0L1n

经过这2天测试，发现任务实例主机为空的情况造成原因：我采用的物理机集群部署，输入时区指令：timedatectl 然后物理机的时区是new_york，然后马上调整时区为Asia/Shanghai (CST, +0800)，然后重启服务，发现问题解决了，如图：企业微信截图_17096115034609 @zhongjiajie

Mar 06 '24 01:03 liubo988

企业微信截图_17096900323858 如上图,dolphinscheduler_env.sh ，环境变量区值为GMT+8,这个没问题，但是由于服务器本身时区问题，导致少了13个小时

Mar 06 '24 02:03 liubo988

+1,3.1.9升级到3.2.0出现这个情况，升级到3.2.1也出现这个情况，而且我的情况更多，希望能帮忙看看，为什么这么多task instance host is null !!!!

更新：经过多次尝试和跟踪发现，由于任务瞬发内存占用过高导致ds相关服务异常，以至于后续容错恢复或重跑都无法正常执行，且会多次出现task instance host is null 的情况，进而导致任务失败过多。

Mar 06 '24 02:03 z0L1n

3.2.0 has the same problem

Mar 19 '24 06:03 q4q5q6qw

后面服务重启了1次，又出现这种情况了。。。

Mar 25 '24 06:03 liubo988

我们3.2.1也碰到了这个问题，目前看可能是因为full gc把map里面存的worder都给回收了或者配置的worder资源不够

Mar 29 '24 05:03 sean1205

我们3.2.1也碰到了这个问题，目前看可能是因为full gc把map里面存的worder都给回收了或者配置的worder资源不够

遇到了同样的问题，最后怎么解决的？加worker机器么

Apr 10 '24 06:04 ahululu

我们3.2.1也碰到了这个问题，目前看可能是因为full gc把map里面存的worder都给回收了或者配置的worder资源不够

遇到了同样的问题，最后怎么解决的？加worker机器么

是的，我们是放大了master的最大堆内存，减少full gc，并且增加了worker节点，就解决了

Apr 17 '24 09:04 sean1205

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

May 18 '24 00:05 github-actions[bot]

This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.

May 27 '24 00:05 github-actions[bot]

dolphinscheduler dolphinscheduler copied to clipboard

[Bug] [Master-server 3.2.1] Host of task instance is null appears in version 3.2.1

Search before asking

What happened

What you expected to happen

How to reproduce

Anything else

Version

Are you willing to submit PR?

Code of Conduct

Search before asking

What happened

What you expected to happen

How to reproduce

Anything else

Version

Are you willing to submit PR?

Code of Conduct

dolphinscheduler
dolphinscheduler copied to clipboard