[BUG] Can't Enable SR-IOV GPU in UI
Describe the bug
You can't enable SR-IOV GPU in the UI
To Reproduce Steps to reproduce the behavior:
- Enable both
pcidevices-controllerandnvidia-driver-toolkitaddon on a system with a GPU installed - Navigate to SR-IOV GPU Devices in the left sidebar
- Enable one of the GPUS by clicking the three dots on the right and selecting enable Expected behavior
The GPU should enable and be available as an option in vGPU devices
Support bundle
supportbundle_b580a14b-9c0c-465c-ad49-dcdde8cfbb84_2024-02-06T01-37-41Z.zip
Environment
- Harvester ISO version: v1.3.0-rc1
- Underlying Infrastructure (e.g. Baremetal with Dell PowerEdge R630): 2 node DL360 bare metal
Additional context
Found while testing #2764 .
It's worth noting that on this cluster there are 2 nodes. Both have GPUs, but one them shouldn't support this feature. The issue might be related to the feature check on the chipset.
cc @ibrokethecloud
There are 2 issues:
- nvidia driver toolkit addon needs a UI enhancement to allow users to specify an override image and location for GPU driver. Currently enabling the addon without editing the yaml results in an invalid endpoint definition as a result of which the driver is never installed. @torchiaf need your assistance on this.
- the 2nd node in the cluster has a NVIDIA GPU which only supports MIG mode vGPU. These GPU's should be ignored by the pcidevices controller and the PR: https://github.com/harvester/pcidevices/pull/65 addresses the same.
Pre Ready-For-Testing Checklist
-
[ ] If labeled: require/HEP Has the Harvester Enhancement Proposal PR submitted? The HEP PR is at:
-
[ ] Where is the reproduce steps/test steps documented? The reproduce steps/test steps are at:
-
[ ] Is there a workaround for the issue? If so, where is it documented? The workaround is at:
-
[ ] Have the backend code been merged (harvester, harvester-installer, etc) (including
backport-needed/*)? The PR is at:-
[ ] Does the PR include the explanation for the fix or the feature?
-
[ ] Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart? The PR for the YAML change is at: The PR for the chart change is at:
-
-
[ ] If labeled: area/ui Has the UI issue filed or ready to be merged? The UI issue/PR is at:
-
[ ] If labeled: require/doc, require/knowledge-base Has the necessary document PR submitted or merged? The documentation/KB PR is at:
-
[ ] If NOT labeled: not-require/test-plan Has the e2e test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue?
- The automation skeleton PR is at:
- The automation test case PR is at:
-
[ ] If the fix introduces the code for backward compatibility Has a separate issue been filed with the label
release/obsolete-compatibility? The compatibility issue is filed at:
Automation e2e test issue: harvester/tests#1159
Validated in master-a2c98e96-head (build right after v1.3.0-rc4). Closing as fixed.