DAOS-17591 dtx: handle orphan DTX entries - b26
Our current DTX resync mechanism does DTX leader sponsored scanning for the specified container. But if current DTX leader is dead, the new DTX leader will switch to another target on which related entry may be not exist or has been committed. Under such case, DTX resync on the new DTX leader will not handle such DTX entry, as to the DTX entry on other non-leaders may become "orphan".
Such kind of orphan DTX entries may affect subsequent rebuild. This patch introduces DTX orphan cleanup mechanism to handle them before rebuild scanning related container.
Steps for the author:
- [ ] Commit message follows the guidelines.
- [ ] Appropriate Features or Test-tag pragmas were used.
- [ ] Appropriate Functional Test Stages were run.
- [ ] At least two positive code reviews including at least one code owner from each category referenced in the PR.
- [ ] Testing is complete. If necessary, forced-landing label added and a reason added in a comment.
After all prior steps are complete:
- [ ] Gatekeeper requested (daos-gatekeeper added as a reviewer).
Ticket title is 'Active DTX cleanup after global metadata verification' Status is 'In Progress' https://daosio.atlassian.net/browse/DAOS-17591
Test stage Functional Hardware Large completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16483/3/execution/node/1431/log
Test stage Functional Hardware Medium Verbs Provider completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16483/3/testReport/
Test stage Functional Hardware Medium completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16483/4/execution/node/1427/log
Test stage Functional Hardware Medium Verbs Provider completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16483/4/testReport/
Test stage Functional Hardware Medium UCX Provider completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16483/6/execution/node/730/log
Test stage Functional Hardware Medium UCX Provider completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16483/6/execution/node/730/log
test_daos_rebuild_ec failed for DAOS-17773, not related with the patch.