gpu-operator
gpu-operator copied to clipboard
[feature] Dynamic MIG partitioner
Would it be possible that the gpu operator makes the MIG setup transparent such that the end user can directly request per-pod GPU memory requirements on-demand while, under the hood, MIG configuration is dynamically re-partitioned? i.e. without any intervention of a sysadmin / devops team.
# requesting eight 40Gi MIG slices
resources:
limits:
nvidia.com/gpu: 8
nvidia.com/gpu_memory: "40Gi"
https://www.nvidia.com/en-us/technologies/multi-instance-gpu/
MIG instances can also be dynamically reconfigured
Please see this document on why this is not feasible under the current Kubernetes resource model: Challenges Supporting Multi-Instance GPUs (MIG) in Kubernetes
Once the following newly accepted Kubernetes Enhancement proposal gets implemented, we will be able to build a device plugin that properly supports what you suggest: https://github.com/kubernetes/enhancements/pull/3064
I updated the example to emphasise the capability to perform MIG-sliced multi-gpu training, e.g. by requesting eight 40Gi MIG slices (from different GPU cards on the same DGX). This is currently not possible AFAIK not even with a static MIG layout.
Once we have Dynamic Resource Allocation all of what you propose will be possible. We do not plan to "hack" this support onto the existing plugin and instead will be putting all efforts to support an API like this into the new plugin for DRA.
I agree with @klueska about how DRA is the right way. However, @elgalu, I do not agree its not feasible under the current circumstances. You welcome to watch the following video (https://www.youtube.com/watch?v=zk7g3FbW7go) that show it had been achieved in Kubernetes
Hi @klueska, I cannot wait to try this new DRA feature but after read the KEP, I have some concerns about how the resource driver will be implemented.
What I want is not only dynamic MIG configuration but also dynamically allocating network-attached GPUs.
In my understanding, a Resource Driver needs to define its own ResourceClaimParameter CRD, allocate and configure the devices, and interact with kubelet to prepare devices for containers. Most of these work are device specific and should be handled by the device vendor I believe, but allocation seems different and complicate when the devices are dynamically attached from network. How could the resource driver determine which device it should attach and how to interact with the infrastructure to attach the device? My infrastructure is built with Liqid fabric switches connected with NVIDIA gpus and bare metal servers. A machine can be created and reprogrammed using Liqid management software. In this case, do I need to write some component to receive request from the Resource Driver and interact with Liqid by myself?
Could you tell me the NVIDIA's thought about how to implement a Resource Driver and how to support dynamically attaching devices?
Please see this document on why this is not feasible under the current Kubernetes resource model: Challenges Supporting Multi-Instance GPUs (MIG) in Kubernetes
Once the following newly accepted Kubernetes Enhancement proposal gets implemented, we will be able to build a device plugin that properly supports what you suggest: kubernetes/enhancements#3064
Looks like kubernetes/enhancements#3064 has merged! Any thoughts on this ask?
Right now I am using https://github.com/nebuly-ai/nos for dynamic GPU partitioning. It's solving the purpose for now but facing issue when using with Karpenter. These days group is not active enough to contribute for the solution. Hoping NVIDIA will come up with a plugin to address this requirement.