serve GPU sharing support across models

GPU sharing support across models

Open abhinav-cashify opened this issue 3 years ago • 3 comments

🚀 The feature

Need feature for sharing the GPU across models. It can be configured by setting the 0 < workers < 1 for a model.

Motivation, pitch

Currently one of my models is using 15% GPU utilization. I want multiple models to share GPU simultaneously. So that rest of the GPU can be utilized at the same time by other models.

Alternatives

No response

Additional context

No response

Jul 26 '22 18:07 abhinav-cashify

@msaroufim

Jul 27 '22 08:07 amit-cashify

@abhinav-cashify @amit-cashify this is already on our roadmap.

Jul 27 '22 16:07 lxning

@abhinav-cashify @amit-cashify out of curiosity what kind of GPU are you using? Starting from Ampere, NVIDIA has added support for MIG to allow resource isolation and allocate partial GPUs. We would just need to make sure to pass in the right device id from config.properties to the handler https://docs.nvidia.com/datacenter/tesla/mig-user-guide/

Here's how to use it in your PyTorch code https://discuss.pytorch.org/t/access-gpu-partitions-in-mig/142272

If you get it to work I'd be happy to merge your contribution otherwise can look into this for our next sprint

Aug 15 '22 23:08 msaroufim

serve serve copied to clipboard

GPU sharing support across models

🚀 The feature

Motivation, pitch

Alternatives

Additional context

serve
serve copied to clipboard