Consistent Throughput of Model Across Varying GPU Assignments with Nvidia GPU Operator Time-slicing

Open arashasg opened this issue 2 years ago • 0 comments

1. Quick Debug Information

OS/Version(e.g. RHEL8.6, Ubuntu22.04): Ubuntu20.04
Kernel Version: 5.15.0-89-generic
Container Runtime Type/Version(e.g. Containerd, CRI-O, Docker): Docker
K8s Flavor/Version(e.g. K8s, OCP, Rancher, GKE, EKS): K8s
GPU Operator Version: latest

2. Issue or feature description

I observed that when using Nvidia GPU Operator, the throughput of a Yolo model remained constant regardless of the number of GPUs assigned to the pod. I conducted experiments with a single Yolo workload on a machine with two K80 Tesla GPUs time-sliced into 8 replicas (4 each). Despite varying the number of GPUs assigned to the pod (ranging from 1 to 8 replicas), the throughput of the model did not show an expected increase.

3. Steps to reproduce the issue

Deploy Nvidia GPU Operator on a Kubernetes cluster.
Set up two K80 Tesla GPUs, time-sliced into 8 replicas (4 each).
Create a Yolo workload pod with different numbers of GPUs assigned (1 to 8 replicas).
Measure and compare the throughput of the Yolo model for each configuration.

Figure_2 Figure_3

Additional Info

I'm sure that GPU time-slicing is activated, as I followed the instructions and created 8 pods and all were in running status. I'm specifically interested in understanding whether GPU time-slicing provides the entire GPU to a single pod when there are no other pods contending for GPU resources. If not, what factors may be contributing to the observed consistent throughput regardless of the number of GPUs assigned.

Dec 24 '23 14:12 arashasg