DAOS-17639 test: Detect all server fabric_ifaces
Launch.py will detect all of the fastest interfaces common to all the specified server hosts and use them to populate the engine fabric_iface entries if no overrides are provided in the test yaml.
Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: IorSmall
Steps for the author:
- [ ] Commit message follows the guidelines.
- [ ] Appropriate Features or Test-tag pragmas were used.
- [ ] Appropriate Functional Test Stages were run.
- [ ] At least two positive code reviews including at least one code owner from each category referenced in the PR.
- [ ] Testing is complete. If necessary, forced-landing label added and a reason added in a comment.
After all prior steps are complete:
- [ ] Gatekeeper requested (daos-gatekeeper added as a reviewer).
Ticket title is 'Support newly named ib devices for functional tests' Status is 'In Review' Labels: 'testp1' https://daosio.atlassian.net/browse/DAOS-17639
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16913/4/execution/node/557/log
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16913/3/execution/node/805/log
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16913/6/execution/node/747/log
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16913/5/execution/node/805/log
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16913/7/execution/node/805/log
Test stage Functional on EL 8.8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16913/10/execution/node/665/log
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16913/11/execution/node/557/log
Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16913/14/execution/node/895/log
Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16913/14/execution/node/954/log
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16913/14/execution/node/909/log
Failures seen in https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-16913/14/testReport/ are known issues or should not be related to PR changes - in all cases the servers started successfully:
- 2-./container/boundary.py:BoundaryTest.test_container_boundary - https://daosio.atlassian.net/browse/DAOS-18040
- 1-./erasurecode/multiple_rank_failure.py:EcodOnlineMultiRankFail.test_ec_multiple_rank_failure - https://daosio.atlassian.net/browse/DAOS-16339
- 1-./soak/smoke.py:SoakSmoke.test_soak_smoke - https://daosio.atlassian.net/browse/DAOS-18043
- 6-./nvme/enospace.py:NvmeEnospace.test_enospace_no_aggregation - This test suppose to fail because of DER_NOSPACEbut it got Passed
- 4-./recovery/pool_list_consolidation.py:PoolListConsolidationTest.test_lost_majority_ps_replicas -
- 19-./daos_test/suite.py:DaosCoreTest.test_daos_rebuild_simple_interactive - timeout waiting for rebuild
- 24-./daos_test/suite.py:DaosCoreTest.test_daos_rebuild_ec - https://daosio.atlassian.net/browse/DAOS-17657
- 1-./recovery/cat_recov_core.py:CatRecovCoreTest.test_daos_cat_recov_core - https://daosio.atlassian.net/browse/DAOS-17977
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16913/15/execution/node/553/log
While we still need to resolve the IOMMU issue on hdr-233, e.g.
2025-10-06 15:04:11,977 server_utils L0592 INFO | Resetting DAOS server storage: /usr/bin/daos_server nvme reset --ignore-config
2025-10-06 15:04:11,977 run_utils L0481 DEBUG| Running on hdr-[232-233] with a 120 second timeout: export COVFILE=/tmp/test.cov; /usr/bin/daos_server nvme reset --ignore-config
2025-10-06 15:04:16,895 run_utils L0343 DEBUG| hdr-232 (rc=0): <no output>
2025-10-06 15:04:16,895 run_utils L0347 DEBUG| hdr-233 (rc=1):
2025-10-06 15:04:16,895 run_utils L0352 DEBUG| ERROR: processing request parameters: storage: code = 311 description = "IOMMU capability is required to access NVMe devices but no IOMMU capability detected"
2025-10-06 15:04:16,895 run_utils L0352 DEBUG| ERROR: storage: code = 311 resolution = "enable IOMMU per the DAOS Admin Guide"
We were able to get a few tests to pass on the hdr-23 cluster - like 1-./container/snapshot_aggregation.py:SnapshotAggregation.test_snapshot_aggregation - where it used a server config containing:
engines:
- fabric_iface: ib_cpu0_0
fabric_iface_port: 31317
- fabric_iface: ib_cpu1_0
fabric_iface_port: 31417
hdr-233 has been fixed so that VT/d is now actually enabled.
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16913/18/execution/node/553/log
hdr-233 has been fixed so that VT/d is now actually enabled.
Now hdr-234 is reporting a problem:
2025-10-07 06:58:54,617 run_utils L0481 DEBUG| Running on hdr-[232-234] with a 120 second timeout: export COVFILE=/tmp/test.cov; /usr/bin/daos_server nvme reset --ignore-config
2025-10-07 06:58:59,539 run_utils L0343 DEBUG| hdr-[232-233] (rc=0): <no output>
2025-10-07 06:58:59,539 run_utils L0347 DEBUG| hdr-234 (rc=1):
2025-10-07 06:58:59,539 run_utils L0352 DEBUG| ERROR: processing request parameters: storage: code = 311 description = "IOMMU capability is required to access NVMe devices but no IOMMU capability detected"
2025-10-07 06:58:59,539 run_utils L0352 DEBUG| ERROR: storage: code = 311 resolution = "enable IOMMU per the DAOS Admin Guide"
Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16913/17/execution/node/943/log
Found more nodes with "vt/d" disabled and made sure it is enabled on all hdr-23x systems.
Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16913/17/execution/node/971/log
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16913/17/execution/node/957/log
Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16913/19/execution/node/911/log
Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16913/19/execution/node/970/log
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16913/19/execution/node/925/log
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16913/20/execution/node/553/log
Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16913/22/execution/node/954/log
Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16913/22/execution/node/909/log
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16913/22/execution/node/864/log
Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16913/23/execution/node/519/log
Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16913/23/execution/node/429/log