k8s-device-plugin icon indicating copy to clipboard operation
k8s-device-plugin copied to clipboard

k8s-device-plugin can manage and allocate GPU automatically?

Open un-human opened this issue 5 years ago • 1 comments

The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.

1. Issue or feature description

Issue : k8s-device-plugin can manage and allocate GPU automatically?

2. Steps to reproduce the issue

1.node-1 have one GPU , node-2 have one GPU 2. i create 3 gpu-pod

question-1:
k8s-device-plugin can manage one gpu-pod on node-1 and one gpu-pod on node-2 ?????? and third gpu-pod is pending; it allocate GPU to different nodes automatically????

question-2:

if the gpu-pod on node-2 faster delete, the third gpu-pod can from pending turn to running on node-2 ? ???( third gpu-pod can running not Unavailable,error)

k8s-device-plugin can manage and allocate GPU automatically? or i need do it by myself??

3. Information to attach (optional if deemed irrelevant)

Common error checking:

  • [ ] The output of nvidia-smi -a on your host
  • [ ] Your docker configuration file (e.g: /etc/docker/daemon.json)
  • [ ] The k8s-device-plugin container logs
  • [ ] The kubelet logs on the node (e.g: sudo journalctl -r -u kubelet)

Additional information that might help better understand your environment and reproduce the bug:

  • [ ] Docker version from docker version
  • [ ] Docker command, image and tag used
  • [ ] Kernel version from uname -a
  • [ ] Any relevant kernel output lines from dmesg
  • [ ] NVIDIA packages version from dpkg -l '*nvidia*' or rpm -qa '*nvidia*'
  • [ ] NVIDIA container library version from nvidia-container-cli -V
  • [ ] NVIDIA container library logs (see troubleshooting)

un-human avatar Dec 15 '20 11:12 un-human

question-1: k8s-device-plugin can manage one gpu-pod on node-1 and one gpu-pod on node-2 ?????? and third gpu-pod is pending; it allocate GPU to different nodes automatically????

It's not really the plugin that does this, but yes, that is what will happen. Kubernetes will make sure each of your 1-GPU pods lands on a different node, and the plugin on each node will make sure your pod gets access to the GPU on that node.

question-2: if the gpu-pod on node-2 faster delete, the third gpu-pod can from pending turn to running on node-2 ? ???( third gpu-pod can running not Unavailable,error) k8s-device-plugin can manage and allocate GPU automatically? or i need do it by myself??

I don't know what you mean by "(third gpu-pod can running not Unavailable,error)", but yes, Kubernetes will make sure to deploy and run pod-3 once pod-2 completes. Again, it's not the plugin that does this, but just the way Kubernetes works.

klueska avatar Dec 15 '20 11:12 klueska

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.

github-actions[bot] avatar Feb 29 '24 04:02 github-actions[bot]

This issue was automatically closed due to inactivity.

github-actions[bot] avatar Mar 31 '24 04:03 github-actions[bot]