modelmesh-serving icon indicating copy to clipboard operation
modelmesh-serving copied to clipboard

[Need Help] Model orchestration documentation

Open Phelan164 opened this issue 3 years ago • 2 comments

Overview

Describe the goal or feature or two, usually in the form of a user story. As a user, I want to modelmesh help me automatically and efficiently orchestrate models into available runtime servers, so that I no need to care about where model will be placed

Acceptance Criteria

Questions

  • Sorry for ask this, I am new here and interested this project to do model orchestration. I am not sure now modelmesh already supported or not. I try to find some tutorial documents but still not found anything instead of deploying single model. If it already support, could you help me pin down some documents or guidelines. I have some questions:
  • when it provision more runtime server?
  • how it find suitable runtime server to deploy
  • how about scalability (I tried with 2 triton runtime server and deploy few models, then I checked there are some model weights downloaded in both triton runtime server and serve) Sorry for my lacking understanding Thanks!

Assumptions

Reference

Phelan164 avatar Jan 10 '22 04:01 Phelan164

Hey @Phelan164, thanks for trying out ModelMesh!

The number of runtime deployments are determined by a podsPerRuntime configuration setting, and aren't currently scaled dynamically. More info about this setting and scaling can be found here.

For model placement, first ModelMesh finds a ServingRuntime that has a compatible model type/format in its SupportedModelFormat list. Which pod of the Selected ServingRuntime deployment is selected for model placement is generally determined by a few factors such as pod request load and cache age. ModelMesh also handles how many copies of a model are loaded, where recently used models have at least two copies loaded. This number can scale up or down based on usage.

Some useful links might be: https://developer.ibm.com/blogs/kserve-and-watson-modelmesh-extreme-scale-model-inferencing-for-trusted-ai/ https://www.youtube.com/watch?v=rmYXPlzU4H8

pvaneck avatar Jan 12 '22 00:01 pvaneck

@pvaneck we have an use case where we have a stateful model (continuos learning based on feedback), for that case is it possible to restrict a model to a single pod. Is there a config to control that ?

We are not concerned about load on single model as these are individual user personal models.

Thanks for the great project, this is definitely useful initiative.

Nagarajj avatar Jan 22 '22 19:01 Nagarajj