DAOS-17737 dtx: handle race between DTX refresh and DTX abort
If current transaction is aborted during dtx_refresh() yield by race, then return non-zero value to the sponsor to trigger client side RPC retry. That will make related transaction's status to be more clean.
More check after dtx_refresh() to avoid re-initializing aborted DTX.
The patch also cleanup the usage for vos_dtx_validation() to handle kinds of DTX abort (and maybe resent after that) cases.
Steps for the author:
- [ ] Commit message follows the guidelines.
- [ ] Appropriate Features or Test-tag pragmas were used.
- [ ] Appropriate Functional Test Stages were run.
- [ ] At least two positive code reviews including at least one code owner from each category referenced in the PR.
- [ ] Testing is complete. If necessary, forced-landing label added and a reason added in a comment.
After all prior steps are complete:
- [ ] Gatekeeper requested (daos-gatekeeper added as a reviewer).
Ticket title is '"D_ASSERT(dth->dth_ent != NULL);" failure in dtx_handle_reinit()' Status is 'In Progress' Labels: 'ALCF,hpe_cluster' https://daosio.atlassian.net/browse/DAOS-17737
Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16535/4/testReport/
Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16535/4/testReport/
Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16535/6/execution/node/885/log
Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16535/6/execution/node/930/log
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16535/6/execution/node/840/log
@NiuYawei @wangshilong , please help to review the patch that you have ever reviewed. Thanks!