ozone
ozone copied to clipboard
HDDS-10136. Recon displaying DELETED container as missing.
What changes were proposed in this pull request?
Root Cause
The main problem is with how container state transitions are handled. During the deletion process, if a container is in the DELETING state, Recon might mark it as MISSING due to no healthy replicas being reported. This happens because Recon checks the container state periodically, and during the sync delay, the state change to DELETED might not be reflected immediately.
Known Behavior
This is a known behavior and is generally not a cause for concern. The discrepancy is temporary and resolves itself once the ContainerHealthTask
in Recon synchronizes the container state with SCM. The brief period where a container is shown as MISSING despite being DELETED in SCM is due to the inherent delay in state synchronization between Recon and SCM.
Solution
The solution involves modifying the ContainerHealthTask
in Recon to handle these states correctly:
- Skip DELETING Containers: When Recon finds a container marked as MISSING, it now checks if the container is actually in the DELETED state in SCM. If so, it skips any further processing for that container.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-10136
How was this patch tested?
UT's