elsa-core icon indicating copy to clipboard operation
elsa-core copied to clipboard

Retention Module Job could cause timeout when deleting large data (or small db)

Open jdevillard opened this issue 3 years ago • 1 comments

Issue detected using SQL Azure Standard S1: 20 DTUs.

I've dig into the retention module and see that no data was really deleted from the Db. After going step by step , I've seen different things :

https://github.com/elsa-workflows/elsa-core/blob/e4bab8810afc201cd7b474fccb1e0757e448d528/src/modules/retention/Elsa.Retention/Jobs/CleanupJob.cs#L52-L69

Record are paged for retrieval but they will be deleted in batch. This can cause timeout when deleting with silent error, and a retry only after the SweepInterval

image

Following the code :

await _workflowInstanceStore.DeleteManyAsync(specification, cancellationToken);

goes through (EventPublishingWorkflowInstanceStore.cs)

        public async Task<int> DeleteManyAsync(ISpecification<WorkflowInstance> specification, CancellationToken cancellationToken = default)
        {
            var instances = await FindManyAsync(specification, cancellationToken: cancellationToken).ToList();
            var count = await _store.DeleteManyAsync(specification, cancellationToken);

            if (instances.Any())
            {
                foreach (var instance in instances)
                    await _mediator.Publish(new WorkflowInstanceDeleted(instance), cancellationToken);

                await _mediator.Publish(new ManyWorkflowInstancesDeleted(instances), cancellationToken);
            }

            return count;
        }

Then in EntityFrameworkWorkflowInstanceStore.cs, a new time FindManyAsync

        public override async Task<int> DeleteManyAsync(ISpecification<WorkflowInstance> specification, CancellationToken cancellationToken = default)
        {
            var workflowInstances = (await FindManyAsync(specification, cancellationToken: cancellationToken)).ToList();
            var workflowInstanceIds = workflowInstances.Select(x => x.Id).ToArray();
            await DeleteManyByIdsAsync(workflowInstanceIds, cancellationToken);
            return workflowInstances.Count;
        }

to finaly DeleteManyByIdsAsync with dependencies.

I think we loose the effort of using the pagination and I think we should delete records for each page instead of using page only for first retrieval

jdevillard avatar Aug 12 '22 14:08 jdevillard

We can also change the way we select data, from older to newer.

https://github.com/elsa-workflows/elsa-core/blob/e4bab8810afc201cd7b474fccb1e0757e448d528/src/modules/retention/Elsa.Retention/Jobs/CleanupJob.cs#L49

to

var orderBy = new OrderBy<WorkflowInstance>(x => x.CreatedAt, SortDirection.asc); 

jdevillard avatar Aug 12 '22 14:08 jdevillard

@jdevillard Closing this since your PR has been merged. Thank you for the improvements!

sfmskywalker avatar Oct 03 '22 08:10 sfmskywalker