dolphinscheduler [Bug] Timed task not trigger

Search before asking

[X] I had searched in the issues and found no similar issues.

What happened

[INFO] 2024-06-21 16:57:01.556 +0800 o.a.d.s.q.QuartzScheduler:[104] - Add job, job name: job_39, group name: jobgroup_1
[INFO] 2024-06-21 16:57:01.606 +0800 o.a.d.s.q.QuartzScheduler:[137] - schedule job trigger, triggerName: job_39, triggerGroupName: jobgroup_1, cronExpression: 10 * * * * ? *, startDate: Fri Jun 21 16:57:01 CST 2024, endDate: Wed Jun 21 00:00:00 CST 2124

My timed task add success but never trigger

What you expected to happen

The task should be triggered every minute.

How to reproduce

Just create a 'shell' task , print some message , online this timed task.

Anything else

No response

Version

3.2.x

Are you willing to submit PR?

[ ] Yes I am willing to submit a PR!

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Jun 21 '24 08:06 1032851561

Is there any error log in master? or error command in t_ds_error_command? You can get the scheduler count metrics by ds_master_quartz_job_executed

Jun 21 '24 11:06 ruanwenjun

Jun 21 '24 13:06 1032851561

ProcessScheduleTask#executeInternal is running on master. You need to provide more information, e.g. your cluster information, is this bug can reproduce?

Jun 22 '24 14:06 ruanwenjun

The bug is alway exist. All timed job not trigger. My cluster: docker deployment , 1 master ,1 worker ,1 apiserver , postgresql database

The process goes like this:

All timed tasks are normal for a long time.
Dorck exception , master,worker,api server down.
Start cluster, all task not trigger.
found error in master , some log like 'Master handle command xxx error '
Manually changing the record xxx in the tds_process_instance table : state -> 7
Delete all records in t_ds_error_command
Restart master , have not error log any more.

I can't see the log of ProcessScheduleTask in master: scheduled fire time :{}, fire time......, so is quartz something wrong?

master.log

Jun 23 '24 02:06 1032851561

I try to debug master:

run the sql directly:

Jun 23 '24 03:06 1032851561

This is caused by master handle command failed, you can find the reason from t_ds_error_command or master error log

Jun 24 '24 02:06 ruanwenjun

https://github.com/apache/dolphinscheduler/issues/16197#issuecomment-2184493186

Jun 24 '24 03:06 1032851561

If you delete the records from t_ds_error_command, then you cannot find out the reason why the command handle failed. I am not clear why you delete these, these will not affect the system.

Jun 27 '24 13:06 ruanwenjun

My problem is not why the command handler failed . Instead, ProcessScheduleTask why doesn't execute, this is a quratz job ,it not trigger.

please see this : https://github.com/apache/dolphinscheduler/issues/16197#issuecomment-2184493186

Jun 28 '24 07:06 1032851561

I'm still not sure what your problem is at the moment, right now ds process timing task will have two steps:

Generate command by quartz task
Execute the command.

You means the step one is wrong? There are many reason may cause the step one not execute. e.g. quartz metadata is incorrect, quartz main thread is block, db lock. You can find some detail from the log and check if there exist dead lock in db.

Jun 29 '24 06:06 ruanwenjun

Yes， step one is wrong , it is never tigger. Quartz main thread is running , it query the table qrtz_triggers to find some timed job has triggered. When I debug the master service remotely, the code shows 0 records, but running the sql directly in the database shows 3 records.

Jun 29 '24 06:06 1032851561

Is the date is correct of the master machine?

Jun 29 '24 07:06 ruanwenjun

The date is correct.

Jul 01 '24 06:07 1032851561

@1032851561 If this occurs next time, please provide the whole log of your masters. I have no idea now.

Jul 10 '24 17:07 ruanwenjun

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

Aug 21 '24 00:08 github-actions[bot]

This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.

Aug 28 '24 00:08 github-actions[bot]

dolphinscheduler dolphinscheduler copied to clipboard

[Bug] Timed task not trigger

Search before asking

What happened

What you expected to happen

How to reproduce

Anything else

Version

Are you willing to submit PR?

Code of Conduct

dolphinscheduler
dolphinscheduler copied to clipboard