modelmesh-serving
modelmesh-serving copied to clipboard
No information about how Predictor autoscaling works?
I don't see any information in the documentation about how Predictors in ModelMesh are autoscaled... How is the autoscaling of (the "copies") a Predictor running in ModelMesh configured/managed? Which component of the architecture is responsible for it?
Hi @rehevkor5 -- for ModelMesh, scaling happens at the ServingRuntime pod level. Each runtime pod can have multiple predictors. Predictors are loaded and evicted based on runtime statistics, memory pressure etc.
Here are a few links to other issues that might be helpful:
- https://github.com/kserve/modelmesh/issues/46
- https://github.com/kserve/modelmesh-serving/issues/329
- https://github.com/kserve/modelmesh-serving/issues/225#issuecomment-1244172486
- https://github.com/kserve/modelmesh-serving/issues/374#issuecomment-1562941339
- https://github.com/kserve/modelmesh-serving/issues/331#issuecomment-1435436756
- https://github.com/kserve/modelmesh-serving/issues/330#issuecomment-1435437958
There might be an opportunity to collect some of that information and add it to the ModelMesh docs.
Hi @ckadner , @rafvasq, @Jooho , I wanted to ask you whether there's any task going on to improve the docs on HPA for modelmesh-serving? It's just not clear from the current docs.
Thank you
I think it is a matter of collecting some info from issues past, PRs, PR review comments etc. If you want to start a PR we could collectively add to it.