DAOS-17737 dtx: handle race between DTX refresh and DTX abort - b26
If current transaction is aborted during dtx_refresh() yield by race, then return non-zero value to the sponsor to trigger client side RPC retry. That will make related transaction's status to be more clean.
More check after dtx_refresh() to avoid re-initializing aborted DTX.
The patch also cleanup the usage for vos_dtx_validation() to handle kinds of DTX abort (and maybe resent after that) cases.
Steps for the author:
- [ ] Commit message follows the guidelines.
- [ ] Appropriate Features or Test-tag pragmas were used.
- [ ] Appropriate Functional Test Stages were run.
- [ ] At least two positive code reviews including at least one code owner from each category referenced in the PR.
- [ ] Testing is complete. If necessary, forced-landing label added and a reason added in a comment.
After all prior steps are complete:
- [ ] Gatekeeper requested (daos-gatekeeper added as a reviewer).
Ticket title is '"D_ASSERT(dth->dth_ent != NULL);" failure in dtx_handle_reinit()' Status is 'In Progress' Labels: 'ALCF,alcf_track,hpe_cluster' https://daosio.atlassian.net/browse/DAOS-17737
Test stage Functional Hardware Large completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16536/1/execution/node/1341/log
Test stage Unit Test bdev on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16536/2/testReport/
Test stage Unit Test bdev with memcheck on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16536/2/testReport/
Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16536/2/testReport/
Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16536/2/testReport/
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16536/4/execution/node/885/log
Test stage Functional Hardware Large completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16536/4/execution/node/930/log
Test stage Functional Hardware Medium completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16536/4/execution/node/840/log
Test stage Test RPMs on EL 8.6 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-16536/7/display/redirect
Test stage Test RPMs on EL 8.6 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-16536/8/display/redirect
Test stage Test RPMs on EL 8.6 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-16536/9/display/redirect
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16536/10/execution/node/707/log
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16536/10/execution/node/707/log
DAOS_Rebuild_EC.REBUILD46 failed for DAOS-17773, not related with the patch