daos icon indicating copy to clipboard operation
daos copied to clipboard

DAOS-17547 rebuild: error on stopped ds_pool_child

Open NiuYawei opened this issue 8 months ago • 1 comments

When a faulty SSD is replaced, reintegration will be auto triggered once local setup completed (ds_pool_child started).

Howerver, admin could manually run "dmg pool reintegrate" before the local setup done, then we need to return a retry-able error to make reintegration keep retry until the local ds_pool_child started.

Steps for the author:

  • [ ] Commit message follows the guidelines.
  • [ ] Appropriate Features or Test-tag pragmas were used.
  • [ ] Appropriate Functional Test Stages were run.
  • [ ] At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • [ ] Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • [ ] Gatekeeper requested (daos-gatekeeper added as a reviewer).

NiuYawei avatar May 14 '25 03:05 NiuYawei

Ticket title is 'Engine aborts while reintegrating an SSD that is replaced online' Status is 'In Review' Labels: 'md_on_ssd' https://daosio.atlassian.net/browse/DAOS-17547

github-actions[bot] avatar May 14 '25 03:05 github-actions[bot]