k8s-device-plugin
k8s-device-plugin copied to clipboard
Can't understand why do I need nvidia-device-plugin
Hi! I've read all the k8s docs, I've read all the local docs about plugin itself, I do understand what the nvidia-container-runtime is, I've tried deploy this device plugin, device plguin from GCP, I have no questions how to deploy it etc. But...
I completely can't understand why do I need it? Maybe I didn't understand something. Let me show
I have 1.14 cluster, bare metal node with Tesla k40c and nvidia/cuda drivers installed
here is my nvidia-smi output
==============NVSMI LOG==============
Timestamp : Wed Sep 15 21:42:05 2021
Driver Version : 440.64.00
CUDA Version : 10.2
Attached GPUs : 1
GPU 00000000:84:00.0
Product Name : Tesla K40c
Product Brand : Tesla
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0321816019511
GPU UUID : GPU-afa1c01a-3776-2166-5689-cc8ef444f42b
Minor Number : 0
VBIOS Version : 80.80.65.00.03
MultiGPU Board : No
Board ID : 0x8400
GPU Part Number : 900-22081-0350-000
Inforom Version
Image Version : 2081.0206.01.04
OEM Object : 1.1
ECC Object : 3.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x84
Device : 0x00
Domain : 0x0000
Device Id : 0x102410DE
Bus Id : 00000000:84:00.0
Sub System Id : 0x0983103C
GPU Link Info
PCIe Generation
Max : 3
Current : 3
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : N/A
Rx Throughput : N/A
Fan Speed : 25 %
Performance State : P0
Clocks Throttle Reasons
Idle : Not Active
Applications Clocks Setting : Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : N/A
HW Power Brake Slowdown : N/A
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 11441 MiB
Used : 2559 MiB
Free : 8882 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 2 MiB
Free : 254 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Texture Shared : N/A
CBU : N/A
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Texture Shared : N/A
CBU : N/A
Total : 0
Aggregate
Single Bit
Device Memory : 5
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Texture Shared : N/A
CBU : N/A
Total : 5
Double Bit
Device Memory : 2
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Texture Shared : N/A
CBU : N/A
Total : 2
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 1
Pending Page Blacklist : No
Temperature
GPU Current Temp : 50 C
GPU Shutdown Temp : 95 C
GPU Slowdown Temp : 90 C
GPU Max Operating Temp : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 68.75 W
Power Limit : 235.00 W
Default Power Limit : 235.00 W
Enforced Power Limit : 235.00 W
Min Power Limit : 180.00 W
Max Power Limit : 235.00 W
Clocks
Graphics : 745 MHz
SM : 745 MHz
Memory : 3004 MHz
Video : 540 MHz
Applications Clocks
Graphics : 745 MHz
Memory : 3004 MHz
Default Applications Clocks
Graphics : 745 MHz
Memory : 3004 MHz
Max Clocks
Graphics : 875 MHz
SM : 875 MHz
Memory : 3004 MHz
Video : 540 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes
Process ID : 16885
Type : C
Name : python3
Used GPU Memory : 1272 MiB
Process ID : 16910
Type : C
Name : python3
Used GPU Memory : 1272 MiB
docker 19.03 along with nvidia-container-runtime are installed and configured
dpkg -l | grep docker
ii docker-ce 5:19.03.8~3-0~debian-stretch amd64 Docker: the open-source application container engine
ii docker-ce-cli 5:19.03.15~3-0~debian-stretch amd64 Docker CLI: the open-source application container engine
cat /etc/docker/daemon.json
{
"live-restore": true,
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
nvidia-container-cli -V
version: 1.3.3
build date: 2021-02-05T13:30+00:00
build revision: bd9fc3f2b642345301cb2e23de07ec5386232317
build compiler: x86_64-linux-gnu-gcc-6 6.3.0 20170516
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
My setup works
Let me explain what I don't understand. I need gpu resource on the node that will be used by the scheduler, right? Ok, I can PATCH my node with extended resource as it described here
get memory count from nvidia-smi
gpu_memory="$(nvidia-smi --query-gpu=memory.total --format=csv,noheader,nounits 2>/dev/null)"
and push it right into the node status, something like:
CONTENT_TYPE="application/json-patch+json"
RESOURCE_NAME="example.ru~1gpu_memory"
CAPACITY_PATH="/status/capacity/${RESOURCE_NAME}"
gpu_memory="$(nvidia-smi --query-gpu=memory.total --format=csv,noheader,nounits 2>/dev/null)" #(11441)
data="[{\"op\": \"add\", \"path\": \"${CAPACITY_PATH}\", \"value\": \"${gpu_memory}\"}]"
curl \
--silent \
"${API_URL}" \
--key "${KEY}" \
--cert "${CERT}" \
--cacert "${CA}" \
--header "Content-Type: ${CONTENT_TYPE}" \
--request PATCH \
--data "${data}"
that's it! my node has the resource
Capacity:
...
example.ru/gpu_memory: 11441
...
Here is the pod yaml
$ cat gpu.pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
containers:
- name: cuda-container
image: nvcr.io/nvidia/cuda:9.0-devel
resources:
limits:
example.ru/gpu_memory: "128"
- name: digits-container
image: nvcr.io/nvidia/digits:20.12-tensorflow-py3
resources:
limits:
example.ru/gpu_memory: "128"
I'm going to deploy it all without device-plugin
k apply -f gpu.pod.yaml
I can see GPU has been detected although it says that model is not supported in that version of digits image
Containers:
cuda-container:
Container ID: docker://76683ea186d0be74124dced77ad71207decf4b9534c7b41292377956716d6e9e
Image: nvcr.io/nvidia/cuda:9.0-devel
Image ID: docker-pullable://nvcr.io/nvidia/cuda@sha256:879e34e7059ed350140bb0b40f1b1c543846ce9a2088133494b0b3495d8c92c5
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed <------------------------------
Exit Code: 0
Started: Wed, 15 Sep 2021 22:22:22 +0300
Finished: Wed, 15 Sep 2021 22:22:22 +0300
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 15 Sep 2021 22:22:10 +0300
Finished: Wed, 15 Sep 2021 22:22:10 +0300
Ready: False
Restart Count: 1
Limits:
example.ru/gpu_memory: 128
Requests:
example.ru/gpu_memory: 128
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-pv8hv (ro)
$ k logs -f gpu-pod digits-container
============
== DIGITS ==
============
NVIDIA Release 20.12 (build 17912121)
DIGITS Version 6.1.1
Container image Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
DIGITS Copyright (c) 2014-2019, NVIDIA CORPORATION. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.
ERROR: Detected NVIDIA Tesla K40c GPU, which is not supported by this container
ERROR: No supported GPU(s) detected to run this container
NOTE: Legacy NVIDIA Driver detected. Compatibility mode ENABLED.
2021-09-15 19:22:18.710270: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
___ ___ ___ ___ _____ ___
| \_ _/ __|_ _|_ _/ __|
| |) | | (_ || | | | \__ \
|___/___\___|___| |_| |___/ 6.1.1
Docs says
The NVIDIA device plugin for Kubernetes is a Daemonset that allows you to automatically:
- Expose the number of GPUs on each nodes of your cluster
- Keep track of the health of your GPUs
- Run GPU enabled containers in your Kubernetes cluster.
So I can run any count of pods with gpu-enabled containers in k8s without device-plugin. Containers have all they need to work with GPU from docker/nvidia-container-runtime, don't they? How could device plugin help me? What advantages it could give? I appreciate any help, any advises, links to learn or explanations you can give. I just want to make it clear for myself
I would advise reading up on the device plugin framework, which should help you understand the motivation, use cases, advantages: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-management/device-plugin.md
Containers have all they need to work with GPU from docker/nvidia-container-runtime, don't they?
Yes, you are correct. The nvidia-container-toolkit
stack, which includes libnvidia-container
, nvidia-container-runtime
, etc., is all you need to run GPU workloads in containers. The NVIDIA device plugin is specific to Kubernetes.
I need gpu resource on the node that will be used by the scheduler
Yes, the device plugin makes the Kubernetes scheduler aware of gpu resources in your cluster. In your example, you manually did this. The major advantage of the device plugin is that it automates this process for all nodes and allows you to scale up/down your cluster seamlessly.
@cdesiniotis tanks a lot! A've read it lately. I don't understand how device plugin provides libs to pod container when they have already been provided by nvidia-container-toolkit stack? Where the vars like NVIDIA_VISIBLE_DEVICES etc. are declaring? How an app able to read and understand such a var? Do I need special base image for that? Is it for restricting some GPU abilities from an app? I surfed tens of closed and open issues, still can't understand, It is confusing me a lot
I would advise reading up on the device plugin framework, which should help you understand the motivation, use cases, advantages: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-management/device-plugin.md
Current design proposal link is https://github.com/kubernetes/design-proposals-archive/blob/acc25e14ca83dfda4f66d8cb1f1b491f26e78ffe/resource-management/device-plugin.md
This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.