elsa-core icon indicating copy to clipboard operation
elsa-core copied to clipboard

Fix: Implement batched deletion to prevent SQL timeouts with large workflow instance datasets

Open Copilot opened this issue 3 months ago • 0 comments

Problem

Bulk deletion of 300K+ workflow instances causes SQL Server timeouts (30s default) and application crashes. The current implementation loads all instance summaries into memory and executes a single DELETE statement.

Changes

Configuration

  • Added ManagementOptions.BulkDeleteBatchSize (default: 1000) to control deletion batch size

Batched Deletion Algorithm

  • Refactored WorkflowInstanceManager.BulkDeleteAsync to process deletions in configurable batches
  • Each iteration fetches only IDs (not full summaries) for next batch using Offset=0 (records shift up post-deletion)
  • Deletes batch via DeleteAsync with filtered ID list
  • Maintains existing notification events per batch
  • Continues until no matching records remain

Testing

  • Added unit tests covering multi-batch, single-batch, and empty scenarios

Configuration Example

services.Configure<ManagementOptions>(options =>
{
    options.BulkDeleteBatchSize = 500; // Adjust for DB performance
});

Impact

  • Prevents SQL timeouts by limiting DELETE statement scope
  • Reduces memory footprint: ~1KB per 1000 IDs vs MB for summaries
  • Backward compatible: existing code works without changes
  • Observable: added logging at Information and Debug levels
Original prompt

This section details on the original issue you should resolve

<issue_title>【Retention】 超时导致程序崩溃。</issue_title> <issue_description>I use sqlserver to store the workflow instances and currently have a timer that fires 5 child workflows every second, producing over 300,000 workflow instances in about 3 days. The background setup executes every 1 hours to clean up the workflow that was completed seven days ago, but on the third day, a timeout exception will be raised, which will eventually cause the application to crash.

The current version is 3.5.1.

Image

Below is a screenshot of the system event log:

Image Image

Image</issue_description>

Comments on the Issue (you are @copilot in this section)

  • Fixes elsa-workflows/elsa-core#7097

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.


This change is Reviewable

Copilot avatar Nov 24 '25 05:11 Copilot