modelmesh-serving
modelmesh-serving copied to clipboard
[Need Help] Model orchestration documentation
Overview
Describe the goal or feature or two, usually in the form of a user story. As a user, I want to modelmesh help me automatically and efficiently orchestrate models into available runtime servers, so that I no need to care about where model will be placed
Acceptance Criteria
Questions
- Sorry for ask this, I am new here and interested this project to do model orchestration. I am not sure now modelmesh already supported or not. I try to find some tutorial documents but still not found anything instead of deploying single model. If it already support, could you help me pin down some documents or guidelines. I have some questions:
- when it provision more runtime server?
- how it find suitable runtime server to deploy
- how about scalability (I tried with 2 triton runtime server and deploy few models, then I checked there are some model weights downloaded in both triton runtime server and serve) Sorry for my lacking understanding Thanks!
Assumptions
Reference
Hey @Phelan164, thanks for trying out ModelMesh!
The number of runtime deployments are determined by a podsPerRuntime
configuration setting, and aren't currently scaled dynamically. More info about this setting and scaling can be found here.
For model placement, first ModelMesh finds a ServingRuntime that has a compatible model type/format in its SupportedModelFormat list. Which pod of the Selected ServingRuntime deployment is selected for model placement is generally determined by a few factors such as pod request load and cache age. ModelMesh also handles how many copies of a model are loaded, where recently used models have at least two copies loaded. This number can scale up or down based on usage.
Some useful links might be: https://developer.ibm.com/blogs/kserve-and-watson-modelmesh-extreme-scale-model-inferencing-for-trusted-ai/ https://www.youtube.com/watch?v=rmYXPlzU4H8
@pvaneck we have an use case where we have a stateful model (continuos learning based on feedback), for that case is it possible to restrict a model to a single pod. Is there a config to control that ?
We are not concerned about load on single model as these are individual user personal models.
Thanks for the great project, this is definitely useful initiative.