server icon indicating copy to clipboard operation
server copied to clipboard

Supporting MIG with multi model instance groups

Open lazy-nurd opened this issue 2 years ago • 1 comments

Is your feature request related to a problem? Please describe. I splitted A100-80G in 7 mig devices, to infer a model in parallel. But there is no way to define the MIG device in config.txt which is limiting us to deploy our models to production. Only thing that is available is ( https://developer.nvidia.com/blog/getting-the-most-out-of-the-a100-gpu-with-multi-instance-gpu/?ncid=afm-chs-44270&ranMID=44270&ranEAID=a1LgFw09t88&ranSiteID=a1LgFw09t88-82FqiN7UfWN8TQPTChNndw ), but it helps to define on the docker level, but we want to use the same docker for multiple instances and define the gpu mig device per instance in config.txt

Describe the solution you'd like Ability to set mig device id in config.txt

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

lazy-nurd avatar Sep 19 '22 16:09 lazy-nurd

Hello, what is the problem you are facing when doing this? We haven't tested this feature yet but there are alternatives you can do.

  1. You can infer using multiple models using the instance group
  2. MIG is used for cloud applications where the client can send requests to multiple servers (that run on different MIG devices) and the orchestration is done by pods as indicated by the doc you shared.

Can you explain a bit why these solutions does not work for you?

jbkyang-nvi avatar Sep 23 '22 01:09 jbkyang-nvi

Closing issue due to inactivity. If you would like to follow up or have further questions, please let us know and we will reopen the issue.

dyastremsky avatar Oct 04 '22 15:10 dyastremsky