k8s-device-plugin icon indicating copy to clipboard operation
k8s-device-plugin copied to clipboard

Return all available directly if preferred allocating all

Open zwpaper opened this issue 2 years ago • 4 comments

We found retrieving all GPU statuses will take seconds in nodes with multiple GPUs when kubelet calls GetPreferredAllocation.

and if the available ones are all the kubelet requesting, maybe the device plugins can return the available ones directly.

we made some changes and found it was working as expected much faster.

so we are raising up this PR to discuss this solution:

  1. is it ok to skip the following nvml detections in GetPreferredAllocation?
  2. are there any other things we need to consider before adding this change?

zwpaper avatar Nov 30 '23 07:11 zwpaper

Note that this repository is a read-only mirror of https://gitlab.com/nvidia/kubernetes/device-plugin.

If possible, please create a merge request there instead and close this PR.

elezar avatar Nov 30 '23 10:11 elezar

Hi @elezar, thanks for the info, created one at https://gitlab.com/nvidia/kubernetes/device-plugin/-/merge_requests/339

zwpaper avatar Dec 01 '23 02:12 zwpaper

@zwpaper sorry for the back and forth. We are in the process of migrating to GitHub as our primary repository. I am reopening this PR and closing the GitLab one.

elezar avatar Jan 28 '24 21:01 elezar