ozone icon indicating copy to clipboard operation
ozone copied to clipboard

HDDS-9198. Changed snapshot purge to single purge instead of batch purge

Open hemantk-12 opened this issue 10 months ago • 0 comments

What changes were proposed in this pull request?

We found two race conditions issues HDDS-10524 and HDDS-10590 which are fixed in PR #6443 and PR #6456 respectively.

There is still an issue with the existing way batch snapshot purge is processed.

As part of the snapshot purge, the deep clean flag of the next active snapshot, and the global and path previous of the next global and path level snapshots get updated. For this, updatedSnapInfos and updatedPathPreviousAndGlobalSnapshots maps are maintained in OMSnapshotPurgeRequest, and then these maps are flushed sequentially in OMSnapshotPurgeResponse. There is a problem with that and can cause chain corruption. For example, let's assume as part of deep clean info update, snapshots are updated as {E -> E', F -> F', B' -> B'', G -> G'} and kept in updatedSnapInfos: [E', F', B'', G'] and previous snapshots are updated as {A - > A', B -> B', C -> C', D -> D'} and kept in updatedPathPreviousAndGlobalSnapshots: [A', B', C', D']. After the purge final snapshot list should be [A', B'', C', D', E', F', G'] but because these maps are added to the batch sequentially [A', B', C', D', E', F', G'] or [A', B'', C', D', E', F', G'] depending on which one is added to the batch first code. The problem can still exist even if you fix the order of maps flush.

Ideally, these should be flushed in the same order the purge batch is processed.

This change is to fix the issue by changing the snapshot purge to take one snapshot at a time rather than the list of snapshots. For backward compatibility when Ratis transaction contains a list of snapshots, a new object is introduced to maintain the order of transaction and flush in the same order, they were updated in OMSnapshotPurgeRequest.

What is the link to the Apache JIRA

HDDS-9198

How was this patch tested?

Added and updated unit tests.

hemantk-12 avatar Apr 08 '24 04:04 hemantk-12