monai-deploy-app-sdk GPU Isolation and flexible deployment strategies [FEA]

GPU Isolation and flexible deployment strategies [FEA]

Open vikashg opened this issue 3 years ago • 2 comments

Is your feature request related to a problem? Please describe. If we consider a few scenarios where we need

to deploy multiple models for a single application.
deploy multiple models on the same machine with different GPU architectures.
lockin resources for deployment so I can do training with the remaining resources.

In all these examples, we want to assign a GPU to a model and do not want the inference service to take up the entire system. If we can isolate the GPU and pin it to a particular deployment, it will be really useful. In addition, this will also future proof our deployments. Imagine a scenario where we get new GPUs with new architectures. Maybe the deployment and the model and pytorch versions do not work with the new architecture. In such a case, we can add more GPUs without disturbing the deployments.

Describe alternatives you've considered @slbryson has tried GPU isolation using clara CLI tools.

Additional context

Jan 21 '22 20:01 vikashg

This also ties in loosly to what @MMelQin was mentioning about trying to have multiple models deployed in a MAP

Jan 21 '22 21:01 vikashg

This is definitely a good request for a much needed capability, though more for a deployment platform, e.g. Clara inference operators/applications uses remote Triton Inference Service which supports model to GPU affinity, number of instances per model etc, so, Triton configuration can be used for distributing model instance(s) to GPU.

App SDK does have an issue for utilizing remote Triton inference service, #212

As for multi-model support, #244, when all the inference operators use in-proc inference, it is possible to

link the operators in the app (application.add_flow()) in such a way that only one inference operator can run at any given time, such that GPU is not overloaded.
Potentially enhance the model loading logic in the App SDK base Application to make use of specific GPU if so configured, but this becomes moot if remote Triton is used.

Jan 22 '22 00:01 MMelQin

monai-deploy-app-sdk monai-deploy-app-sdk copied to clipboard

GPU Isolation and flexible deployment strategies [FEA]

monai-deploy-app-sdk
monai-deploy-app-sdk copied to clipboard