daos icon indicating copy to clipboard operation
daos copied to clipboard

DAOS-11231 dtx: handle dependency during task conversion

Open Nasf-Fan opened this issue 3 years ago • 10 comments

For the task that need to be handled via distributed transaction, the client side IO logic will convert it to a transactional task. During dc_tx_convert(), the original task maybe freed because of the dependency on the sub-task(s) for existence check inside the dc_tx_attach(). That will cause the "task" to become invalid for subsequent process for the conversion.

To avoid above bad case, the patch moves the TX commit logic for task conversion into related existence check callback before the TX release.

Signed-off-by: Fan Yong [email protected]

Nasf-Fan avatar Aug 10 '22 14:08 Nasf-Fan

Bug-tracker data: Ticket title is 'soak: 2.2 mdtest job fails with "Attempting to use OP ID that was not completed"' Status is 'In Review' Labels: 'triaged' https://daosio.atlassian.net/browse/DAOS-11231

github-actions[bot] avatar Aug 10 '22 14:08 github-actions[bot]

Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-9957/1/display/redirect

daosbuild1 avatar Aug 10 '22 23:08 daosbuild1

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-9957/1/display/redirect

daosbuild1 avatar Aug 10 '22 23:08 daosbuild1

Test stage Functional Hardware Small completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-9957/1/display/redirect

daosbuild1 avatar Aug 11 '22 00:08 daosbuild1

One or more nodes failed post-provision configuration! to be retested.

Nasf-Fan avatar Aug 11 '22 14:08 Nasf-Fan

Test stage Build on Leap 15 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9957/3/execution/node/454/log

daosbuild1 avatar Aug 12 '22 05:08 daosbuild1

Test stage Build RPM on EL 8.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9957/3/execution/node/343/log

daosbuild1 avatar Aug 12 '22 05:08 daosbuild1

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9957/3/execution/node/413/log

daosbuild1 avatar Aug 12 '22 05:08 daosbuild1

Test stage Build RPM on Leap 15 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9957/3/execution/node/355/log

daosbuild1 avatar Aug 12 '22 05:08 daosbuild1

Test stage Build on CentOS 7 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9957/3/execution/node/451/log

daosbuild1 avatar Aug 12 '22 05:08 daosbuild1

Test stage Build on Leap 15 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9957/5/execution/node/432/log

daosbuild1 avatar Aug 17 '22 13:08 daosbuild1

Test stage Build RPM on EL 8.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9957/5/execution/node/338/log

daosbuild1 avatar Aug 17 '22 13:08 daosbuild1

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9957/5/execution/node/409/log

daosbuild1 avatar Aug 17 '22 13:08 daosbuild1

Test stage Build RPM on Leap 15 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9957/5/execution/node/397/log

daosbuild1 avatar Aug 17 '22 13:08 daosbuild1

Test stage Build on CentOS 7 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9957/5/execution/node/453/log

daosbuild1 avatar Aug 17 '22 13:08 daosbuild1

Is that a clean cherry-pick?

johannlombardi avatar Aug 20 '22 08:08 johannlombardi

Is that a clean cherry-pick?

Logically, it is clean cherry-pick, but because of some baseline difference, there is 1 line merge conflict if directly apply master version on 2.2.

On the other hand, I was required to refresh the master version. So this 2.2 version become stale. I will make another back-port when the master version got reviewed.

Nasf-Fan avatar Aug 22 '22 14:08 Nasf-Fan

Now, the patch is consistent with the master version (https://github.com/daos-stack/daos/pull/9948).

Nasf-Fan avatar Aug 24 '22 12:08 Nasf-Fan

Shall we proceed with this patch?

johannlombardi avatar Aug 25 '22 17:08 johannlombardi

Shall we proceed with this patch?

DAOS-11231 issue was reported on 2.2. This patch fixes some known tasks dependency issues that may cause client side memory corruption. It is suspected to be one possible reasons for DAOS-11231. But as Maureen Jean said it is not easy to be reproduced. So we cannot directly prove that it does fixed the original DAOS-11231 issue. Anyway, I still suggest to apply the patch to release/2.2 since it fixes some known memory corruption.

Nasf-Fan avatar Aug 26 '22 03:08 Nasf-Fan