serve
serve copied to clipboard
GPU sharing support across models
🚀 The feature
Need feature for sharing the GPU across models. It can be configured by setting the 0 < workers < 1 for a model.
Motivation, pitch
Currently one of my models is using 15% GPU utilization. I want multiple models to share GPU simultaneously. So that rest of the GPU can be utilized at the same time by other models.
Alternatives
No response
Additional context
No response
@msaroufim
@abhinav-cashify @amit-cashify this is already on our roadmap.
@abhinav-cashify @amit-cashify out of curiosity what kind of GPU are you using? Starting from Ampere, NVIDIA has added support for MIG to allow resource isolation and allocate partial GPUs. We would just need to make sure to pass in the right device id from config.properties to the handler https://docs.nvidia.com/datacenter/tesla/mig-user-guide/
Here's how to use it in your PyTorch code https://discuss.pytorch.org/t/access-gpu-partitions-in-mig/142272
If you get it to work I'd be happy to merge your contribution otherwise can look into this for our next sprint