DAOS-17661 control: Maintain hugepage allocations with nvme-rebind
The dmg storage nvme-rebind command can be used when, during non-VMD hotplug, a "new" SSD is hot-plugged into a slot that previously contained a faulty SSD. Errors related to creating a new SPDK I/O channel on dmg storage replace nvme have been attributed to the inadvertent shrinking of SPDK hugepage kernel allocations during the nvme-rebind call. This change addresses the problem by maintaining the number of hugepages allocated during nvme-rebind.
Features: control
Steps for the author:
- [ ] Commit message follows the guidelines.
- [ ] Appropriate Features or Test-tag pragmas were used.
- [ ] Appropriate Functional Test Stages were run.
- [ ] At least two positive code reviews including at least one code owner from each category referenced in the PR.
- [ ] Testing is complete. If necessary, forced-landing label added and a reason added in a comment.
After all prior steps are complete:
- [ ] Gatekeeper requested (daos-gatekeeper added as a reviewer).
Ticket title is 'Command to rebind NVMe SSD to userspace driver shrinks hugepage allocation' Status is 'In Review' Labels: 'SPDK,hotplug' https://daosio.atlassian.net/browse/DAOS-17661
Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16493/2/execution/node/1382/log
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16493/2/execution/node/1337/log
Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16493/2/execution/node/1427/log
Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16493/4/testReport/
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16493/4/execution/node/1485/log
Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16493/5/testReport/
Apologies for force-push, I couldn't get the child PRs in the stack merged cleanly. No changes as the rebase applied clearly with no conflicts. TIA
This PR is needed for non-VMD hotplug (CP req).
Test stage Build RPM on EL 8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16493/13/execution/node/307/log
Test stage Build RPM on EL 9 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16493/13/execution/node/306/log
Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16493/13/execution/node/322/log
https://jenkins.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-16493/15/ passed all CI test stages
reviews please
CI run 16 passed all