gpu-operator Time-slicing with multiple GPUs - asking for ability to block single GPU

Time-slicing with multiple GPUs - asking for ability to block single GPU

Open Alexbay218 opened this issue 2 years ago • 3 comments

The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.

1. Issue or feature description

I'm looking to have the ability to configure the scheduler to perform the exact opposite behavior as specified here: https://github.com/NVIDIA/gpu-operator/issues/386

Instead of grabbing gpu resources evenly from all gpus on the node, I'd like a config option to grab from one GPU at a time. This allows some applications to get exclusive access to a single GPU as needed while allowing the rest to time share.

2. Steps to reproduce the issue

Perform a new fresh installation of the GPU operator in a cluster where nodes have more than one GPU.
Enable time-slicing and configure it to allow 2 replicas per GPU
Start a pod to consume 2 gpu extended resource
Pod should have exclusive access to a single GPU, but instead it has access to two GPUs (as intended by https://github.com/NVIDIA/gpu-operator/issues/386)

Jan 04 '23 17:01 Alexbay218

@klueska does it make sense to introduce knobs(env/args) to control allocation logic during GetPreferredAllocation within the device plugin?

Jan 04 '23 18:01 shivamerla

@shivamerla I believe there are cases where we would want distributed GPU scheduling, but there might also be opposite scenarios. It would be great if this setting could be easily changed by configmap or so.

Jan 18 '24 09:01 anencore94

gpu-operator gpu-operator copied to clipboard

Time-slicing with multiple GPUs - asking for ability to block single GPU

1. Issue or feature description

2. Steps to reproduce the issue

gpu-operator
gpu-operator copied to clipboard