elsa-core icon indicating copy to clipboard operation
elsa-core copied to clipboard

[BUG] Cron Expression causes workflow to fire twice in a row

Open MaxBrooks114 opened this issue 1 year ago • 19 comments

Description

I have a workflow that is triggered by a cron expression to run once a day. For some reason it causes the workflow to be triggered twice immediately. The workflow then never completes and stays in the suspended state even though all the activities have been completed successfully.

Steps to Reproduce

  1. Detailed Steps: var cronTrigger = new Cron { CronExpression = new("0 30 16 * * ?"), CanStartWorkflow = true };

  2. Reproduction Rate: Happens every time.

Expected Behavior

Workflow triggers once and completes.

Actual Behavior

workflow triggers twice and enters suspended state.

Screenshots

image

Environment

Elsa 3.1.3, .net 8 on windows

Additional Context

it is a flowchart workflow.

MaxBrooks114 avatar Jun 03 '24 16:06 MaxBrooks114

Based on the information provided, it seems like there might be an issue with the cron job setup or the workflow configuration that is causing the workflow to trigger twice and not complete as expected. Here are a few steps you can take to troubleshoot and resolve the issue:

  1. Check the Cron Expression: Ensure that the cron expression "0 30 16 * * ?" is correct and corresponds to the intended schedule (once a day at 16:30). You can use online cron expression generators to validate the expression.

  2. Review Workflow Design: Examine the workflow design to ensure there are no loops or conditions that could cause the workflow to re-trigger itself.

  3. Check for Multiple Triggers: Verify that there is only one trigger configured to start the workflow and that there are no additional triggers that could cause the workflow to execute twice.

  4. Inspect Logs: Review the application logs for any errors or warnings that could provide more insight into why the workflow is entering a suspended state.

  5. Update Elsa Version: If you are not using the latest version of Elsa, consider updating to the latest stable release, as the issue might have been addressed in a newer version.

  6. Seek Support: If the issue persists, consider reaching out to the Elsa community or support channels for assistance. You can create an issue on the Elsa GitHub repository or ask a question on Stack Overflow with the elsa-workflows tag.

  7. Isolate the Issue: Try to isolate the issue by creating a simple workflow with the same cron trigger to see if the problem is specific to the workflow or the cron setup.

  8. Check for Concurrent Executions: Ensure that the workflow is not configured to allow concurrent executions, which could lead to multiple instances running at the same time.

  9. Workflow Suspension: Investigate why the workflow is being suspended. Workflows can be suspended due to unhandled exceptions or waiting for an event to occur. Make sure all activities are configured correctly and that the workflow is designed to complete.

Unfortunately, without access to the actual workflow definition and environment, it's challenging to provide a more specific solution. If you can provide more details or share the workflow definition, it would be helpful in diagnosing the issue further.

This comment was generated by Glime.

glime-ai[bot] avatar Jun 03 '24 16:06 glime-ai[bot]

The suspended state is unrelated. However the duplicate triggering still happens, unknown why.

MaxBrooks114 avatar Jun 04 '24 15:06 MaxBrooks114

Same problem.I test it by Elsa.Samples.AspNet.Heartbeats.

yutao815 avatar Jun 07 '24 03:06 yutao815

@MaxBrooks114 @yutao815 Can you try with the latest 3.2.0-preview packages to see if the issue still exists?

sfmskywalker avatar Jun 07 '24 16:06 sfmskywalker

I just tried to reproduce this issue, but it works without issue. I will close this as unreproducible, but if you can reproduce the issue with the latest preview packages in a sample project, then please feel free to attach a zip file and I'll check it out. Thank you.

sfmskywalker avatar Jun 07 '24 16:06 sfmskywalker

@sfmskywalker I can confirm this bug still happens for me on preview version 3.2.0-sonar-cloud.1502. I will try to reproduce with a sample project when i have time

MaxBrooks114 avatar Jun 07 '24 19:06 MaxBrooks114

3.2.0-sonar-cloud.1502 Is quite an old build. You might want to try 3.2.0-preview.xxx.

sfmskywalker avatar Jun 07 '24 20:06 sfmskywalker

@sfmskywalker same issue with 3.2.0-preview.1717 too.

MaxBrooks114 avatar Jun 07 '24 21:06 MaxBrooks114

Ok, thank you for checking. I look forward to the repro.

sfmskywalker avatar Jun 07 '24 22:06 sfmskywalker

ElsaCronBugReplication.zip

Here is the bug replicated - it is using the latest 3.3 preview as well. I am using the quartz integration heartbeat workflow sample as a base, and then added a custom activity. It should be noted that your sample as well as mine do not actually use the quartz scheduler or elsa.quartz paackage, only elsa.scheduling.

I have replicated the bug using a specific type of cron expression similar to the ones where i encountered the bug originally, namely a once a day cron expression. Oddly, not only can I reproduce the bug where the job is being fired twice simultaneously, but also repeats every second like the original incarnation of the workflow, like some sort of merging of the two corn expressions. image

as you can see in the screenshot of the code the cron expression should only be firing once, but in the console below we have heartbeat style logs, with two every second

Edit: the cron expression should be firing once a second still. Not twice a second.

MaxBrooks114 avatar Jun 10 '24 19:06 MaxBrooks114

Thank you for the sample project, I'll take a look soon.

sfmskywalker avatar Jun 10 '24 19:06 sfmskywalker

I can confirm the issue; specifically, it happens the first time when triggers don't yet exist in the database.

sfmskywalker avatar Jun 16 '24 22:06 sfmskywalker

I believe CreateSchedulesHostedService cause this behaviour.

DefaultWorkflowDefinitionStorePopulator call GetWorkflowsAsync of CLR Provider. Later IndexTriggersAsync method is used to store triggers and send notifications. ScheduleWorkflows class handles notification and calls ScheduleAsync method of ITriggerScheduler. After all this steps CreateSchedulesHostedService is trying to get all stored triggers and schedule it again.

Because IScheduler is scoped, it can't detect duplicates. If IScheduler, CronosCronParser and Func<IServiceProvider, ICronParser> are registered as singleton scheduler will detect duplicates and replace triggers.

Possible solution will contain changes to SchedullingFeature.cs:

            .AddSingleton<IScheduler, LocalScheduler>()
            ...
            .AddSingleton<CronosCronParser>()
            .AddSingleton(CronParser)

4lexKislitsyn avatar Jun 18 '24 20:06 4lexKislitsyn

@4lexKislitsyn I can confirm this fixed the issue, but you need to make ISystemClock feature a singleton as well. Should I make a pr or will you?

@sfmskywalker please confirm this can be a possible solution.

MaxBrooks114 avatar Jun 20 '24 19:06 MaxBrooks114

@MaxBrooks114 I can create a PR after @sfmskywalker confirm that this solution will not breake something else. Are there other features that should be revised beside ISystemClock?

4lexKislitsyn avatar Jun 21 '24 06:06 4lexKislitsyn

@4lexKislitsyn not as far as I'm aware, it worked fine with your suggested fixes and the systemclock.

MaxBrooks114 avatar Jul 02 '24 15:07 MaxBrooks114

I looked at systemclock feature and it's already registered as singleton in SystemClockFeature.cs.

Also IServiceProvider in LocalScheduler is used for creating of ScheduledCronTask, ScheduledRecurringTask, ScheduledSpecificInstantTask and this classes have IServiceScopeFactory to initialize objects with scoped or transient lifetime. So I think there will not be any issue with it.

I will open PR in few minutes.

4lexKislitsyn avatar Jul 02 '24 15:07 4lexKislitsyn

Is this a dupe? https://github.com/elsa-workflows/elsa-core/issues/5600

dwoldo avatar Jul 02 '24 18:07 dwoldo

@dwoldo Yes, but it also mentions another issue where unpublishing a workflow does not seem to unschedule the timer trigger.

sfmskywalker avatar Jul 02 '24 18:07 sfmskywalker

Fixed via e5ba9a8a9d31541ad24fa0ea0853cce5c4cc9601

sfmskywalker avatar Jul 18 '24 14:07 sfmskywalker