k8s-device-plugin icon indicating copy to clipboard operation
k8s-device-plugin copied to clipboard

I want to deploy three models, one large language model occupying one GPU, one embedding model and one re-ranking model sharing one GPU, How can I do it?

Open Flynn-Zh opened this issue 1 year ago • 1 comments

there are two gpu device on the kubenetes node, the timeSlicing.replicas is two, the nvidia.com/gpu of large langurage model is two, the nvidia.com/gpu of other models are one, but the pod of large langurage model has two gpu device

Flynn-Zh avatar Jun 14 '24 09:06 Flynn-Zh

You can't do this with the standard device plugin. You will need to wait until DRA is available: https://docs.google.com/document/d/1BNWqgx_SmZDi-va_V31v3DnuVwYnF2EmN7D-O_fB6Oo/edit#heading=h.bxuci8gx6hna

klueska avatar Jun 14 '24 11:06 klueska

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.

github-actions[bot] avatar Sep 13 '24 04:09 github-actions[bot]

the DRA cannot be used in production environments. Can we modify the code of GPU allocation strategy to implement it? Can anyone give me some help?

Flynn-Zh avatar Sep 24 '24 07:09 Flynn-Zh

the DRA cannot be used in production environments. Can we modify the code of GPU allocation strategy to implement it? Can anyone give me some help?

example: used timeslicing,replicas set 2, the pod limits set 2,assign a gpu,set the other two pods to 1 and assign them to the same gpu

Flynn-Zh avatar Sep 24 '24 09:09 Flynn-Zh