daos icon indicating copy to clipboard operation
daos copied to clipboard

DAOS-17576 chk: keep orphan pool shard which status is DOWN or DOWNOUT - b26

Open Nasf-Fan opened this issue 8 months ago • 3 comments

When check engine verifies pool membership, it may discard the rank or target which status is ‘DOWN' or 'DOWNOUT’ to release related space. Such logic was fine before. But as incremental reintegration feature is on the way, such logic needs to be improved; otherwise, if related orphan pool shard is removed by check engine, subsequent reintegration has to be started from the scratch instead of incremental work.

Test-tag: cat_recov

Steps for the author:

  • [ ] Commit message follows the guidelines.
  • [ ] Appropriate Features or Test-tag pragmas were used.
  • [ ] Appropriate Functional Test Stages were run.
  • [ ] At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • [ ] Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • [ ] Gatekeeper requested (daos-gatekeeper added as a reviewer).

Nasf-Fan avatar May 16 '25 05:05 Nasf-Fan

Ticket title is 'Do not discard orphan pool shard which status is DOWN or DOWNOUT' Status is 'Awaiting backport' Labels: 'scrubbed_2.6.5' Job should run at elevated priority (1) https://daosio.atlassian.net/browse/DAOS-17576

github-actions[bot] avatar May 16 '25 05:05 github-actions[bot]

Test stage Functional Hardware Medium completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16395/3/execution/node/1289/log

daosbuild3 avatar Jun 17 '25 12:06 daosbuild3

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16395/3/execution/node/1303/log

daosbuild3 avatar Jun 17 '25 13:06 daosbuild3

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16395/6/execution/node/1310/log

daosbuild3 avatar Jul 15 '25 09:07 daosbuild3

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16395/8/execution/node/494/log

daosbuild3 avatar Jul 21 '25 12:07 daosbuild3

Test stage Functional Hardware Medium UCX Provider completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16395/8/execution/node/651/log

daosbuild3 avatar Jul 21 '25 12:07 daosbuild3

Test stage Functional Hardware Medium UCX Provider completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16395/9/execution/node/461/log

daosbuild3 avatar Jul 22 '25 16:07 daosbuild3

Test stage Unit Test with memcheck on EL 8.8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-16395/15/display/redirect

daosbuild3 avatar Aug 04 '25 03:08 daosbuild3

Test stage Unit Test with memcheck on EL 8.8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-16395/16/display/redirect

daosbuild3 avatar Aug 04 '25 05:08 daosbuild3

Test stage Functional Hardware Large completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16395/19/execution/node/616/log

daosbuild3 avatar Aug 08 '25 05:08 daosbuild3

Passed all required CI tests.

Nasf-Fan avatar Aug 10 '25 02:08 Nasf-Fan

Test stage Functional Hardware Large completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16395/21/execution/node/1443/log

daosbuild3 avatar Nov 03 '25 13:11 daosbuild3

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16395/21/execution/node/1429/log

daosbuild3 avatar Nov 03 '25 17:11 daosbuild3

Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16395/23/testReport/

daosbuild3 avatar Nov 05 '25 07:11 daosbuild3

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16395/23/testReport/

daosbuild3 avatar Nov 05 '25 09:11 daosbuild3

Test stage Functional Hardware Large completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16395/25/execution/node/1349/log

daosbuild3 avatar Nov 08 '25 22:11 daosbuild3

test_ec_online_rebuild_mdtest failed for DAOS-17751, not related with the patch. All the others passed.

Nasf-Fan avatar Nov 10 '25 09:11 Nasf-Fan

Ping reviewers, thanks!

Nasf-Fan avatar Nov 10 '25 09:11 Nasf-Fan

FYI we don't typically merge cherry-picks without at least 1 approval, unless the gatekeeper also reviews. I will leave this one for Liang :)

daltonbohning avatar Nov 17 '25 23:11 daltonbohning