DAOS-17591 dtx: handle orphan DTX entries
Our current DTX resync mechanism does DTX leader sponsored scanning for the specified container. But if current DTX leader is dead, the new DTX leader will switch to another target on which related entry may be not exist or has been committed. Under such case, DTX resync on the new DTX leader will not handle such DTX entry, as to the DTX entry on other non-leaders may become "orphan".
Such kind of orphan DTX entries may affect subsequent rebuild. This patch introduces DTX orphan cleanup mechanism to handle them before rebuild scanning related container.
Steps for the author:
- [ ] Commit message follows the guidelines.
- [ ] Appropriate Features or Test-tag pragmas were used.
- [ ] Appropriate Functional Test Stages were run.
- [ ] At least two positive code reviews including at least one code owner from each category referenced in the PR.
- [ ] Testing is complete. If necessary, forced-landing label added and a reason added in a comment.
After all prior steps are complete:
- [ ] Gatekeeper requested (daos-gatekeeper added as a reviewer).
Ticket title is 'Active DTX cleanup after global metadata verification' Status is 'In Progress' https://daosio.atlassian.net/browse/DAOS-17591
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16482/3/testReport/
Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16482/5/testReport/
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16482/7/testReport/
Ping reviewers, thanks!
@NiuYawei could you check this patch?