How can I hide a subset of GPUs?
1. Issue or feature description
I'd like to allocate only a subset of GPUs on nodes to containers but I think I lost a way.
The envvar NVIDIA_VISIBLE_DEVICES does not seem to work properly as mentioned in #197 and #236 .
Even though I configured NVIDIA_VISIBLE_DEVICES to a subset of GPU IDs such as 0,1, all GPUs are still being scheduled to containers.
Please correct me if I miss something. Thank you!
2. Steps to reproduce the issue
-
Component Versions
- k8s:
v1.21 - gpu-operator:
v22.9.2
- k8s:
-
You can see the
NVIDIA_VISIBLE_DEVICESset properly.
$ vi values.yaml
...
devicePlugin:
enabled: true
repository: nvcr.io/nvidia
image: k8s-device-plugin
version: v0.13.0-ubi8
imagePullPolicy: IfNotPresent
imagePullSecrets: []
args: []
env:
- name: PASS_DEVICE_SPECS
value: "true"
- name: FAIL_ON_INIT_ERROR
value: "true"
- name: DEVICE_LIST_STRATEGY
value: envvar
- name: DEVICE_ID_STRATEGY
value: uuid
- name: NVIDIA_VISIBLE_DEVICES
value: "0,1"
- name: NVIDIA_DRIVER_CAPABILITIES
value: all
resources: {}
...
- Install
gpu-operatorby executing the following:
$ helm install -n gpu-operator gpu-operator ./ -f ./values.yaml --set psp.enabled=true
W0822 17:43:42.630182 3461180 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0822 17:43:42.683118 3461180 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
NAME: gpu-operator
LAST DEPLOYED: Tue Aug 22 17:43:42 2023
NAMESPACE: gpu-operator
STATUS: deployed
REVISION: 1
TEST SUITE: None
$
- You can find all the corresponding PODs running well:
$ k -n gpu-operator get all
NAME READY STATUS RESTARTS AGE
pod/gpu-feature-discovery-bf8jn 1/1 Running 0 6m36s
pod/gpu-operator-59db9d5cfb-bssq7 1/1 Running 0 7m21s
pod/gpu-operator-node-feature-discovery-master-59b4b67f4f-qbjqj 1/1 Running 0 7m21s
pod/gpu-operator-node-feature-discovery-worker-92n9t 1/1 Running 0 7m21s
pod/gpu-operator-node-feature-discovery-worker-9t7ft 1/1 Running 0 7m21s
pod/gpu-operator-node-feature-discovery-worker-hbmtb 1/1 Running 0 7m21s
pod/gpu-operator-node-feature-discovery-worker-r972t 1/1 Running 0 7m21s
pod/gpu-operator-node-feature-discovery-worker-s8sv9 1/1 Running 0 7m21s
pod/gpu-operator-node-feature-discovery-worker-ttwrg 1/1 Running 0 7m21s
pod/gpu-operator-node-feature-discovery-worker-xbgpn 1/1 Running 0 7m21s
pod/nvidia-container-toolkit-daemonset-c4hn6 1/1 Running 0 6m36s
pod/nvidia-cuda-validator-zskzg 0/1 Completed 0 6m12s
pod/nvidia-dcgm-exporter-52pcn 1/1 Running 0 6m36s
pod/nvidia-device-plugin-daemonset-qtjkb 1/1 Running 0 6m36s
pod/nvidia-device-plugin-validator-8b4ps 0/1 Completed 0 5m56s
pod/nvidia-mig-manager-c2s22 1/1 Running 0 6m36s
pod/nvidia-operator-validator-bltcz 1/1 Running 0 6m36s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/dcgm-exporter NodePort 10.233.3.232 <none> 9400:30081/TCP 173d
service/gpu-operator ClusterIP 10.233.32.26 <none> 8080/TCP 6m36s
service/gpu-operator-node-feature-discovery-master ClusterIP 10.233.59.128 <none> 8080/TCP 7m21s
service/nvidia-dcgm-exporter ClusterIP 10.233.33.249 <none> 9400/TCP 6m36s
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/gpu-feature-discovery 1 1 1 1 1 nvidia.com/gpu.deploy.gpu-feature-discovery=true 6m36s
daemonset.apps/gpu-operator-node-feature-discovery-worker 7 7 7 7 7 <none> 7m21s
daemonset.apps/nvidia-container-toolkit-daemonset 1 1 1 1 1 nvidia.com/gpu.deploy.container-toolkit=true 6m36s
daemonset.apps/nvidia-dcgm-exporter 1 1 1 1 1 nvidia.com/gpu.deploy.dcgm-exporter=true 6m36s
daemonset.apps/nvidia-device-plugin-daemonset 1 1 1 1 1 nvidia.com/gpu.deploy.device-plugin=true 6m36s
daemonset.apps/nvidia-mig-manager 1 1 1 1 1 nvidia.com/gpu.deploy.mig-manager=true 6m36s
daemonset.apps/nvidia-operator-validator 1 1 1 1 1 nvidia.com/gpu.deploy.operator-validator=true 6m36s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/gpu-operator 1/1 1 1 7m21s
deployment.apps/gpu-operator-node-feature-discovery-master 1/1 1 1 7m21s
NAME DESIRED CURRENT READY AGE
replicaset.apps/gpu-operator-59db9d5cfb 1 1 1 7m21s
replicaset.apps/gpu-operator-node-feature-discovery-master-59b4b67f4f 1 1 1 7m21s
- You can also find the envvar
NVIDIA_VISIBLE_DEVICESset properly in the device-plugin daemonset definition.
$ k -n gpu-operator describe ds nvidia-device-plugin-daemonset
...
Environment:
PASS_DEVICE_SPECS: true
FAIL_ON_INIT_ERROR: true
DEVICE_LIST_STRATEGY: envvar
DEVICE_ID_STRATEGY: uuid
NVIDIA_VISIBLE_DEVICES: 0,1
...
- But STILL the number of the allocatable GPUs on a node is 8:
$ k describe no gpu-node-01
Capacity:
...
nvidia.com/gpu: 8
Allocatable:
...
nvidia.com/gpu: 8
- And still a POD is able to have more than 1 GPUs as follows:
$ k describe po mychat-57f6d88d96-strp5
...
Limits:
nvidia.com/gpu: 4
Requests:
nvidia.com/gpu: 4
...
$ k exec -ti mychat-57f6d88d96-strp5 -- bash
root@mychat-57f6d88d96-strp5:/opt# nvidia-smi
Tue Aug 22 08:56:23 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.08 Driver Version: 510.73.08 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100-SXM... On | 00000000:07:00.0 Off | 0 |
| N/A 34C P0 68W / 400W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A100-SXM... On | 00000000:0B:00.0 Off | 0 |
| N/A 35C P0 71W / 400W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA A100-SXM... On | 00000000:C8:00.0 Off | 0 |
| N/A 35C P0 67W / 400W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA A100-SXM... On | 00000000:CB:00.0 Off | 0 |
| N/A 35C P0 67W / 400W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
The primary issue here is that the device plugin is started in privileged mode to have access to the device nodes for enumeration. This means that the NVIDIA_VISIBLE_DEVICES environment variable has no effect -- except that it ensures that the required libraries for enumerating the devices are mounted into the container.
Allowing for a set of devices to be selected for use with the device plugin is something that we have discussed internally, but we don't have a timeline or concrete plan of implementation.
If in the daemonset nvidia-device-plugin-daemonset you remove the env NVIDIA_MIG_MONITOR_DEVICES, you can switch to privileged: false and the NVIDIA_VISIBLE_DEVICES works for me.
I don’t have cards with MIG support.
I’m able to use a random subset of the GPUs of my node. describe node reports only 2 of my gpus and pods are scheduled only on these GPUs.
# gpu-operator-v23.9.1
driver:
enabled: false
migManager:
enabled: false
mig:
strategy: single
toolkit:
enabled: true
nfd:
enabled: true
devicePlugin:
env:
- name: PASS_DEVICE_SPECS
value: "true"
- name: FAIL_ON_INIT_ERROR
value: "true"
- name: DEVICE_LIST_STRATEGY
value: envvar
- name: DEVICE_ID_STRATEGY
value: uuid
- name: NVIDIA_VISIBLE_DEVICES
value: GPU-XXX,GPU-XXX
- name: NVIDIA_DRIVER_CAPABILITIES
value: all
Maybe I should do the same thing to the gpu-feature-discovery daemonset.
@Baenimyr the configuration are for nvidia device plugin? how do i make just one node to hide a subset of GPUs?
@Baenimyr the configuration are for nvidia device plugin? how do i make just one node to hide a subset of GPUs?
With NVIDIA_VISIBLE_DEVICES and privileged: false, the configuration is the same for all nodes because nvidia-device-plugin is a daemonset. Nvidia cards UUID are unique, so NVIDIA_VISIBLE_DEVICES is the list of all the visible cards for the whole cluster. This is not a blacklist but a whitelist.
I suppose you don’t have two nodes sharing the same computer and aim to allocate a distinct set of GPUs to each node. It would be a bad idea to use two nodes on the same computer.