k8s-device-plugin icon indicating copy to clipboard operation
k8s-device-plugin copied to clipboard

Is restarting the plugin the only way to update the node GPU profile after mig-enabled GPUs get repartitioned?

Open WindowsXp-Beta opened this issue 2 years ago • 2 comments

Hello, here is my use case. I have a cluster of mig-enabled GPUs, and I want to repartition them frequently(delete some mig instances and create some new ones).

So I want to know if the only way to relabel all GPU instances (which means that the k8s can remove the old mig instances and add new mig instances to the node profile so it can assign the right MIG instance to a Pod) is by restarting the plugin?

If so, since I'm new to k8s, I want to know why didn't you provide another way to get the node updated after the mig is changed. As it takes many times as long to restart the plugin as it does to repartition the GPU.

WindowsXp-Beta avatar Mar 11 '23 13:03 WindowsXp-Beta

@klueska If this feature is not implemented, I want to dive into the source code and see if I can implement it. Could you give me some hints or advice?

WindowsXp-Beta avatar Mar 14 '23 06:03 WindowsXp-Beta

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.

github-actions[bot] avatar Feb 28 '24 04:02 github-actions[bot]