DAOS-17598 vos: misc enhancements for handling DTX conflict - b26
- Use IT_FOR_CHECK when iterate OBJ for DDB dv_path_verify
From DDB perspective, the target with non-aborted DTX should be visible even if related DTX is prepared locally but not globally committed yet. It is the caller's duty to decide how to handle non-committed target in subsequent logic, or DTX resync will handle such DTX sometime later.
DDB logic will use DAOS_INTENT_CHECK instead of DAOS_INTENT_DEFAULT to indicate above purpose when iterate OBJ during dv_path_verify().
-
Do not create VOS LRU metrics entry when initialize standalone tls. That will avoid confused error message from DDB utils.
-
Dump DTX information when conflict being detected.
-
Introduce VOS diagnose mode to allow pool/container to be opened even if there is some corruption, then related issue may be fixed or handled via subsequent operations. It is controlled via server side environment "DAOS_DIAG_MODE". It is set as zero by default. If user wants to handle potential VOS corruption when open the pool/container, then set it as 1 for "check" or 2 for "repair" explicitly when start the engine. If some corruption is detected under "check" mode, then related inconsistency or corruption will be dump, but opening related pool/container will finally fail to avoid further damaging the system.
-
Verify active DTX validity when reindex them, filter out invalid ones. Reassign new local ID if hit conflict DTX if DAOS_DIAG_MODE set as "repair" mode. Then subsequent DTX resync can handle them globally.
-
Do not clear "dae_need_release" flags after partial commit to avoid DTX entry being evicted from DRAM but left on-disk.
Signed-off-by: Fan Yong [email protected]
Steps for the author:
- [ ] Commit message follows the guidelines.
- [ ] Appropriate Features or Test-tag pragmas were used.
- [ ] Appropriate Functional Test Stages were run.
- [ ] At least two positive code reviews including at least one code owner from each category referenced in the PR.
- [ ] Testing is complete. If necessary, forced-landing label added and a reason added in a comment.
After all prior steps are complete:
- [ ] Gatekeeper requested (daos-gatekeeper added as a reviewer).
Errors are Unable to load ticket data https://daosio.atlassian.net/browse/DAOS-17598
Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16445/2/testReport/
Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16445/2/testReport/
Test stage Unit Test bdev with memcheck on EL 8.8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-16445/3/display/redirect
Test stage NLT on EL 8.8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-16445/3/display/redirect
Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16445/4/testReport/
Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16445/5/testReport/
Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16445/7/testReport/
Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16445/8/testReport/
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16445/9/testReport/
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16445/10/testReport/
Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16445/11/testReport/
Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16445/11/testReport/
Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16445/12/testReport/
Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16445/12/testReport/
Test stage Functional Hardware Large completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-16445/12/display/redirect
Test stage Functional Hardware Medium completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16445/12/execution/node/1393/log
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16445/12/execution/node/1483/log
Test stage Functional Hardware Large completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-16445/13/display/redirect
Test stage Functional Hardware Medium completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16445/13/testReport/
Test stage Functional Hardware Medium Verbs Provider completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16445/13/testReport/
test_ior_small failed for DAOS-17573, not related with the patch.
Not need it any longer.