daos icon indicating copy to clipboard operation
daos copied to clipboard

DAOS-17737 dtx: handle race between DTX refresh and DTX abort - b26

Open Nasf-Fan opened this issue 6 months ago • 6 comments

If current transaction is aborted during dtx_refresh() yield by race, then return non-zero value to the sponsor to trigger client side RPC retry. That will make related transaction's status to be more clean.

More check after dtx_refresh() to avoid re-initializing aborted DTX.

The patch also cleanup the usage for vos_dtx_validation() to handle kinds of DTX abort (and maybe resent after that) cases.

Steps for the author:

  • [ ] Commit message follows the guidelines.
  • [ ] Appropriate Features or Test-tag pragmas were used.
  • [ ] Appropriate Functional Test Stages were run.
  • [ ] At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • [ ] Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • [ ] Gatekeeper requested (daos-gatekeeper added as a reviewer).

Nasf-Fan avatar Jun 24 '25 15:06 Nasf-Fan

Ticket title is '"D_ASSERT(dth->dth_ent != NULL);" failure in dtx_handle_reinit()' Status is 'In Progress' Labels: 'ALCF,alcf_track,hpe_cluster' https://daosio.atlassian.net/browse/DAOS-17737

github-actions[bot] avatar Jun 24 '25 15:06 github-actions[bot]

Test stage Functional Hardware Large completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16536/1/execution/node/1341/log

daosbuild3 avatar Jun 24 '25 22:06 daosbuild3

Test stage Unit Test bdev on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16536/2/testReport/

daosbuild3 avatar Jun 26 '25 16:06 daosbuild3

Test stage Unit Test bdev with memcheck on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16536/2/testReport/

daosbuild3 avatar Jun 26 '25 16:06 daosbuild3

Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16536/2/testReport/

daosbuild3 avatar Jun 26 '25 16:06 daosbuild3

Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16536/2/testReport/

daosbuild3 avatar Jun 26 '25 16:06 daosbuild3

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16536/4/execution/node/885/log

daosbuild3 avatar Jul 07 '25 23:07 daosbuild3

Test stage Functional Hardware Large completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16536/4/execution/node/930/log

daosbuild3 avatar Jul 08 '25 00:07 daosbuild3

Test stage Functional Hardware Medium completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16536/4/execution/node/840/log

daosbuild3 avatar Jul 08 '25 03:07 daosbuild3

Test stage Test RPMs on EL 8.6 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-16536/7/display/redirect

daosbuild3 avatar Jul 21 '25 02:07 daosbuild3

Test stage Test RPMs on EL 8.6 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-16536/8/display/redirect

daosbuild3 avatar Jul 21 '25 07:07 daosbuild3

Test stage Test RPMs on EL 8.6 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-16536/9/display/redirect

daosbuild3 avatar Jul 21 '25 11:07 daosbuild3

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16536/10/execution/node/707/log

daosbuild3 avatar Jul 21 '25 23:07 daosbuild3

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16536/10/execution/node/707/log

DAOS_Rebuild_EC.REBUILD46 failed for DAOS-17773, not related with the patch

Nasf-Fan avatar Jul 22 '25 01:07 Nasf-Fan