daos icon indicating copy to clipboard operation
daos copied to clipboard

DAOS-15967 control: Raise RAS event if link speed|width is downgraded

Open tanabarr opened this issue 1 year ago • 6 comments

Features: control Required-githooks: true

Before requesting gatekeeper:

  • [ ] Two review approvals and any prior change requests have been resolved.
  • [ ] Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • [ ] Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • [ ] Commit messages follows the guidelines outlined here.
  • [ ] Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • [ ] You are the appropriate gatekeeper to be landing the patch.
  • [ ] The PR has 2 reviews by people familiar with the code, including appropriate owners.
  • [ ] Githooks were used. If not, request that user install them and check copyright dates.
  • [ ] Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • [ ] All builds have passed. Check non-required builds for any new compiler warnings.
  • [ ] Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • [ ] If applicable, the PR has addressed any potential version compatibility issues.
  • [ ] Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • [ ] Extra checks if forced landing is requested
    • [ ] Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • [ ] No new NLT or valgrind warnings. Check the classic view.
    • [ ] Quick-build or Quick-functional is not used.
  • [ ] Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

tanabarr avatar Jun 28 '24 14:06 tanabarr

Ticket title is 'Raise RAS event if NVMe device link speed or width unexpected' Status is 'In Review' Labels: 'ALCF,usability' https://daosio.atlassian.net/browse/DAOS-15967

github-actions[bot] avatar Jun 28 '24 14:06 github-actions[bot]

Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14665/1/execution/node/1416/log

daosbuild1 avatar Jun 28 '24 18:06 daosbuild1

These new events should probably be added in the section "Event list" of the following documentation page https://github.com/daos-stack/daos/blob/master/docs/admin/administration.md

knard38 avatar Jul 01 '24 07:07 knard38

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14665/1/execution/node/1514/log

daosbuild1 avatar Jul 03 '24 05:07 daosbuild1

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14665/1/execution/node/1607/log

daosbuild1 avatar Jul 03 '24 05:07 daosbuild1

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14665/3/testReport/

daosbuild1 avatar Jul 03 '24 21:07 daosbuild1

Thinking more about this, I find myself wondering, does raising a RAS event make sense if it can only be generated when a user triggers a scan? I'm wondering whether it would be better to include the speed/width information in the health scan output, if noticing the issue is contingent on the user running a command anyway.

That said, I'm not against the PR as-is. Just wanted to bring up the question, as generating a RAS event during the scan just feels like a strange way to notify about a result of a scan.

kjacque avatar Jul 09 '24 21:07 kjacque

Test stage Build on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14665/6/execution/node/501/log

daosbuild1 avatar Jul 10 '24 14:07 daosbuild1

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14665/6/execution/node/282/log

daosbuild1 avatar Jul 10 '24 14:07 daosbuild1

Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14665/6/execution/node/569/log

daosbuild1 avatar Jul 10 '24 14:07 daosbuild1

Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14665/7/testReport/

daosbuild1 avatar Jul 11 '24 22:07 daosbuild1

Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14665/7/execution/node/1189/log

daosbuild1 avatar Jul 12 '24 02:07 daosbuild1

reviews apologies for the force push, had trouble with merging master, no changes in the merge

tanabarr avatar Jul 12 '24 09:07 tanabarr

https://build.hpdd.intel.com/blue/organizations/jenkins/daos-stack%2Fdaos/detail/PR-14665/8/pipeline CI passed, I'm not sure why it is not being recognised on the GH page

tanabarr avatar Jul 18 '24 20:07 tanabarr