quickwit icon indicating copy to clipboard operation
quickwit copied to clipboard

Long transaction in PostgreSQL when marking large number of splits for deletion

Open earlbread opened this issue 2 months ago • 5 comments

When marking a large number of splits for deletion, MetaStore(PostgreSQL) experiences long-running transactions that cause database lock contention and performance degradation.

The mark_splits_for_deletion operation processes all splits in a single transaction without batching, which can lock thousands of rows simultaneously.

    // retention_policy_execution.rs
    let mark_splits_for_deletion_request =
        MarkSplitsForDeletionRequest::new(index_uid, expired_split_ids);
    ctx.protect_future(metastore.mark_splits_for_deletion(mark_splits_for_deletion_request))
        .await?;
2025-09-30 09:00:34.807 | 2025-09-30T00:00:34.807Z ERROR quickwit_janitor::actors::retention_policy_executor: Failed to execute the retention policy on the index. index_id=log.common.application_log_v1_quickwit error=request timed out: client
2025-09-30 09:00:02.393 | 2025-09-30T00:00:02.393Z  INFO quickwit_janitor::retention_policy_execution: Marking 245742 splits for deletion based on retention policy. index_id=log.common.application_log_v1_quickwit split_ids=["01K3W82WGQVVPQBX67JPA51715", "01K3YMY9VYP55QZPX01VHNXJBE", "01K3W6G2FZCEPD6QBM6KHEHE2X", "01K3WDBKTNC3K09KPNV9SVCM4M", "01K3W82VYEN7Q96WFEK6RE5P3Z", and 245737 more]

This problem occurs periodically in the morning hours when retention policies are evaluated and large numbers of splits need to be marked for deletion.

To avoid long-running transactions and database lock contention, these operations should be processed in smaller batches.

earlbread avatar Sep 30 '25 05:09 earlbread