daos icon indicating copy to clipboard operation
daos copied to clipboard

DAOS-13205 control: Detect stale interactive check reports

Open kjacque opened this issue 4 months ago • 19 comments

Due to limitations of the checker, the user can't act on unresolved interactive findings from an older checker instance.

When a new checker instance starts:

  • Remove unresolved interactive findings that will be re-discovered during the checker run (whole system or requested pool).
  • For unresolved findings that won't be re-discovered (e.g. checker starts on a different pool), change the action to STALE, but continue displaying the findings in the interface.

Features: control recovery

Steps for the author:

  • [x] Commit message follows the guidelines.
  • [x] Appropriate Features or Test-tag pragmas were used.
  • [x] Appropriate Functional Test Stages were run.
  • [ ] At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • [x] Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • [ ] Gatekeeper requested (daos-gatekeeper added as a reviewer).

kjacque avatar Oct 17 '25 00:10 kjacque

Ticket title is '"dmg check query" output stale interaction request' Status is 'In Review' Labels: 'scrubbed_2.8' https://daosio.atlassian.net/browse/DAOS-13205

github-actions[bot] avatar Oct 17 '25 00:10 github-actions[bot]

Test stage Functional on EL 8.8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16988/1/execution/node/1073/log

daosbuild3 avatar Oct 17 '25 10:10 daosbuild3

Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16988/3/testReport/

daosbuild3 avatar Oct 19 '25 08:10 daosbuild3

Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16988/3/execution/node/1347/log

daosbuild3 avatar Oct 19 '25 12:10 daosbuild3

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16988/4/testReport/

daosbuild3 avatar Oct 29 '25 04:10 daosbuild3

Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16988/4/execution/node/1373/log

daosbuild3 avatar Oct 29 '25 17:10 daosbuild3

Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16988/4/execution/node/1359/log

daosbuild3 avatar Oct 29 '25 18:10 daosbuild3

Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16988/8/execution/node/1234/log

daosbuild3 avatar Nov 07 '25 11:11 daosbuild3

Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16988/8/execution/node/1415/log

daosbuild3 avatar Nov 08 '25 02:11 daosbuild3

Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16988/9/execution/node/1316/log

daosbuild3 avatar Nov 24 '25 15:11 daosbuild3

Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16988/9/execution/node/1335/log

daosbuild3 avatar Nov 24 '25 16:11 daosbuild3

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16988/11/testReport/

daosbuild3 avatar Nov 25 '25 23:11 daosbuild3

Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16988/11/testReport/

daosbuild3 avatar Nov 27 '25 03:11 daosbuild3

Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16988/14/execution/node/1351/log

daosbuild3 avatar Dec 05 '25 18:12 daosbuild3

Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16988/14/execution/node/1392/log

daosbuild3 avatar Dec 05 '25 20:12 daosbuild3

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16988/15/testReport/

daosbuild3 avatar Dec 09 '25 01:12 daosbuild3

Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16988/15/testReport/

daosbuild3 avatar Dec 09 '25 13:12 daosbuild3

Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16988/15/execution/node/1398/log

daosbuild3 avatar Dec 09 '25 14:12 daosbuild3

Unrelated test failures:

  • DAOS-17951 - test_snapshot_aggregation failure during pool create due to invalid rank
  • DAOS-16759 - test_extend_simple SIGTERM during rebuild
  • DAOS-18343 - NLT valgrind issue is a false positive with Go runtime

kjacque avatar Dec 10 '25 18:12 kjacque

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16988/17/testReport/

daosbuild3 avatar Dec 16 '25 21:12 daosbuild3

Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16988/17/execution/node/1324/log

daosbuild3 avatar Dec 17 '25 12:12 daosbuild3

Test failures:

  • NLT: Known issues with the Go runtime. These are showing up more frequently now.
  • test_dangling_rank_entry: Existing issue in master: https://daosio.atlassian.net/browse/DAOS-18018
  • test_lost_majority_pool_replicas: Existing issue seen in daily tests: https://daosio.atlassian.net/browse/DAOS-17788

kjacque avatar Dec 19 '25 20:12 kjacque

@shimizukko I would still like your input. Please let me know if any additional coverage is needed based on the ftest changes I made. I can address test improvements in a follow-on PR after the holiday break.

kjacque avatar Dec 19 '25 20:12 kjacque

There was a minor merge conflict in src/tests/ftest/recovery/check_start_options.yaml so I resolved it via merge

daltonbohning avatar Dec 22 '25 18:12 daltonbohning

Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16988/18/execution/node/1351/log

daosbuild3 avatar Dec 23 '25 17:12 daosbuild3