elsa-core icon indicating copy to clipboard operation
elsa-core copied to clipboard

Elsa.Quartz triggers deleted on server restart (3.3.x)

Open endrelovas opened this issue 6 months ago • 6 comments

Description

We recently upgraded our project from version 3.2 to 3.3. All existing workflows continue to function without any issues. However, we’ve encountered a problem with newly created workflows that are triggered via cron expressions.

After restarting the application, the new entries are removed from the QRTZ_TRIGGERS table. Interestingly, workflows created under version 3.2 remain intact — only the newly published crons are being deleted.

Steps to Reproduce

  1. Create a new workflow with CRON
  2. Restart the server
  3. Table items removed - CRON jobs not fired anymore

Expected Behavior

Jobs to fire after restart

Actual Behavior

They are not fired, also wiped from DB

Screenshots

Before restart:

Image

After restart: Image

Environment

3.3.x (all editions) K8S - mcr.microsoft.com/dotnet/aspnet:9.0 container + MS SQL Server

Log Output

None

Troubleshooting Attempts

Downgrading to 3.2 solved the issues.

Additional Context

Related Issues

There is another issue that seems related. We have a workflow that uses an HTTP endpoint trigger and appears to be in a “published” state. However, after a restart, calling the endpoint results in a 404 error. Interestingly, if I click “publish” again, everything starts working as expected.

This leads me to believe that, possibly, the application incorrectly assumes some workflows are not in a published state after startup — even though they are. This could also explain why the cron-based workflows are being deleted on startup.

endrelovas avatar May 20 '25 06:05 endrelovas

Hi @endrelovas , would it be possible for you to try and reproduce this issue with the recently released 3.4.0 version? I've not seen this issue during my recent testing with Cron or HTTP Endpoint activities, so chances are the issue might have been resolved.

sfmskywalker avatar May 21 '25 06:05 sfmskywalker

Hi @endrelovas , would it be possible for you to try and reproduce this issue with the recently released 3.4.0 version? I've not seen this issue during my recent testing with Cron or HTTP Endpoint activities, so chances are the issue might have been resolved.

It is the same with 3.4.0. Frustrating.

endrelovas avatar Jun 11 '25 04:06 endrelovas

I just noticed one more thing - the table gets wiped as soon as I terminate the application. When I restart it the table does not get repopulated.

What is the expected behavior here? Should all data in QRTZ_TRIGGERS be deleted when the application exits and then repopulated on restart? Or is it intended for the data to persist across restarts?

endrelovas avatar Jun 11 '25 04:06 endrelovas

I can’t reproduce this. Are you able to reproduce this with a (simplified) project on your local?

The expected behavior is that the Quartz triggers table remains populated. Only when disabling the Quartz feature would cause the triggers to disappear from what I’ve seen with others.

sfmskywalker avatar Jun 11 '25 05:06 sfmskywalker

I can’t reproduce this. Are you able to reproduce this with a (simplified) project on your local?

The expected behavior is that the Quartz triggers table remains populated. Only when disabling the Quartz feature would cause the triggers to disappear from what I’ve seen with others.

Does Application shutdown count as a disable?

This is what we see upon shutdown:

[11:56:49 INF] HTTP POST /_blazor/negotiate responded 200 in 0.3360 ms
[11:56:49 INF] Scheduler QuartzScheduler_$_NON_CLUSTERED shutting down.
[11:56:49 INF] Scheduler QuartzScheduler_$_NON_CLUSTERED paused.
[11:56:49 INF] Scheduler QuartzScheduler_$_NON_CLUSTERED Shutdown complete.
[11:56:49 INF] Registering datasource 'default' with db provider: 'Quartz.Impl.AdoJobStore.Common.DbProvider'
[11:56:49 INF] Using object serializer: Quartz.Simpl.JsonObjectSerializer, Quartz.Serialization.Json
[11:56:49 INF] Initialized Scheduler Signaller of type: Quartz.Core.SchedulerSignalerImpl
[11:56:49 INF] Quartz Scheduler created
[11:56:49 INF] JobFactory set to: Quartz.Simpl.MicrosoftDependencyInjectionJobFactory
[11:56:49 INF] Detected usage of SqlServerDelegate - defaulting 'selectWithLockSQL' to 'SELECT * FROM [quartz].qrtz_LOCKS WITH (UPDLOCK,ROWLOCK) WHERE SCHED_NAME = @schedulerName AND LOCK_NAME = @lockName'.
[11:56:49 INF] Using db table-based data access locking (synchronization).
[11:56:49 INF] Successfully validated presence of 10 schema objects
[11:56:49 INF] JobStoreTX initialized.
[11:56:49 INF] Quartz Scheduler 3.14.0.0 - 'QuartzScheduler' with instanceId 'NON_CLUSTERED' initialized
[11:56:49 INF] Using thread pool 'Quartz.Simpl.DefaultThreadPool', size: 10
[11:56:49 INF] Using job store 'Quartz.Impl.AdoJobStore.JobStoreTX', supports persistence: True, clustered: True
[11:56:49 INF] Adding 0 jobs, 0 triggers.

Curiously quartz get reinitialized.

endrelovas avatar Jun 13 '25 10:06 endrelovas

Furthermore I increased the loglevel to debug. This is what happens when I gracefully stop ELSA (hitting ctrl+C in running console):

[06:02:18 DBG] Lock 'TRIGGER_ACCESS' is desired by: 4ad7a24d-04b5-493e-8159-8260ea760e47
[06:02:18 DBG] Prepared SQL: SELECT * FROM [quartz].qrtz_LOCKS WITH (UPDLOCK,ROWLOCK) WHERE SCHED_NAME = @schedulerName AND LOCK_NAME = @lockName
[06:02:18 DBG] Lock 'TRIGGER_ACCESS' is being obtained: 4ad7a24d-04b5-493e-8159-8260ea760e47
[06:02:18 DBG] Lock 'TRIGGER_ACCESS' given to: 4ad7a24d-04b5-493e-8159-8260ea760e47
[06:02:18 DBG] Prepared SQL: SELECT J.JOB_NAME, J.JOB_GROUP, J.IS_DURABLE, J.JOB_CLASS_NAME, J.REQUESTS_RECOVERY FROM [quartz].qrtz_TRIGGERS T, [quartz].qrtz_JOB_DETAILS J WHERE T.SCHED_NAME = @schedulerName AND T.SCHED_NAME = J.SCHED_NAME AND T.TRIGGER_NAME = @triggerName AND T.TRIGGER_GROUP = @triggerGroup AND T.JOB_NAME = J.JOB_NAME AND T.JOB_GROUP = J.JOB_GROUP
[06:02:18 DBG] Prepared SQL: DELETE FROM [quartz].qrtz_SIMPLE_TRIGGERS WHERE SCHED_NAME = @schedulerName AND TRIGGER_NAME = @triggerName AND TRIGGER_GROUP = @triggerGroup
[06:02:18 DBG] Prepared SQL: DELETE FROM [quartz].qrtz_CRON_TRIGGERS WHERE SCHED_NAME = @schedulerName AND TRIGGER_NAME = @triggerName AND TRIGGER_GROUP = @triggerGroup
[06:02:18 DBG] Prepared SQL: DELETE FROM [quartz].qrtz_TRIGGERS WHERE SCHED_NAME = @schedulerName AND TRIGGER_NAME = @triggerName AND TRIGGER_GROUP = @triggerGroup
[06:02:18 DBG] Lock 'TRIGGER_ACCESS' returned by: 4ad7a24d-04b5-493e-8159-8260ea760e47

It is clear that quartz deletes items.

endrelovas avatar Jun 16 '25 04:06 endrelovas

I see the issue now. When multitenancy was added, the Tenant Deactivating event will clear the triggers, because when a tenant is unregistered at runtime, we want to clear the triggers. The problem is that the event is also called when shutting down the app, causing the triggers to be removed and then re-added when the application starts (at least this is what happens in 3.5).

We need to differentiate between tenant's being unregistered and the system shutting down. Only when a tenant is unregistered should its triggers be removed.

sfmskywalker avatar Aug 05 '25 06:08 sfmskywalker