DAOS-11231 dtx: handle dependency during task conversion
For the task that need to be handled via distributed transaction, the client side IO logic will convert it to a transactional task. During dc_tx_convert(), the original task maybe freed because of the dependency on the sub-task(s) for existence check inside the dc_tx_attach(). That will cause the "task" to become invalid for subsequent process for the conversion.
To avoid above bad case, the patch moves the TX commit logic for task conversion into related existence check callback before the TX release.
Signed-off-by: Fan Yong [email protected]
Bug-tracker data: Ticket title is 'soak: 2.2 mdtest job fails with "Attempting to use OP ID that was not completed"' Status is 'In Review' Labels: 'triaged' https://daosio.atlassian.net/browse/DAOS-11231
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-9957/1/display/redirect
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-9957/1/display/redirect
Test stage Functional Hardware Small completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-9957/1/display/redirect
One or more nodes failed post-provision configuration! to be retested.
Test stage Build on Leap 15 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9957/3/execution/node/454/log
Test stage Build RPM on EL 8.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9957/3/execution/node/343/log
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9957/3/execution/node/413/log
Test stage Build RPM on Leap 15 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9957/3/execution/node/355/log
Test stage Build on CentOS 7 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9957/3/execution/node/451/log
Test stage Build on Leap 15 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9957/5/execution/node/432/log
Test stage Build RPM on EL 8.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9957/5/execution/node/338/log
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9957/5/execution/node/409/log
Test stage Build RPM on Leap 15 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9957/5/execution/node/397/log
Test stage Build on CentOS 7 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9957/5/execution/node/453/log
Is that a clean cherry-pick?
Is that a clean cherry-pick?
Logically, it is clean cherry-pick, but because of some baseline difference, there is 1 line merge conflict if directly apply master version on 2.2.
On the other hand, I was required to refresh the master version. So this 2.2 version become stale. I will make another back-port when the master version got reviewed.
Now, the patch is consistent with the master version (https://github.com/daos-stack/daos/pull/9948).
Shall we proceed with this patch?
Shall we proceed with this patch?
DAOS-11231 issue was reported on 2.2. This patch fixes some known tasks dependency issues that may cause client side memory corruption. It is suspected to be one possible reasons for DAOS-11231. But as Maureen Jean said it is not easy to be reproduced. So we cannot directly prove that it does fixed the original DAOS-11231 issue. Anyway, I still suggest to apply the patch to release/2.2 since it fixes some known memory corruption.