modelmesh-serving No information about how Predictor autoscaling works?

No information about how Predictor autoscaling works?

Open rehevkor5 opened this issue 1 year ago • 3 comments

I don't see any information in the documentation about how Predictors in ModelMesh are autoscaled... How is the autoscaling of (the "copies") a Predictor running in ModelMesh configured/managed? Which component of the architecture is responsible for it?

Sep 21 '23 21:09 rehevkor5

Hi @rehevkor5 -- for ModelMesh, scaling happens at the ServingRuntime pod level. Each runtime pod can have multiple predictors. Predictors are loaded and evicted based on runtime statistics, memory pressure etc.

Here are a few links to other issues that might be helpful:

https://github.com/kserve/modelmesh/issues/46
https://github.com/kserve/modelmesh-serving/issues/329
https://github.com/kserve/modelmesh-serving/issues/225#issuecomment-1244172486
https://github.com/kserve/modelmesh-serving/issues/374#issuecomment-1562941339
https://github.com/kserve/modelmesh-serving/issues/331#issuecomment-1435436756
https://github.com/kserve/modelmesh-serving/issues/330#issuecomment-1435437958

There might be an opportunity to collect some of that information and add it to the ModelMesh docs.

Sep 22 '23 17:09 ckadner

Hi @ckadner , @rafvasq, @Jooho , I wanted to ask you whether there's any task going on to improve the docs on HPA for modelmesh-serving? It's just not clear from the current docs.

Thank you

Feb 26 '24 16:02 juanma9613

I think it is a matter of collecting some info from issues past, PRs, PR review comments etc. If you want to start a PR we could collectively add to it.

Feb 27 '24 06:02 ckadner

modelmesh-serving modelmesh-serving copied to clipboard

No information about how Predictor autoscaling works?

modelmesh-serving
modelmesh-serving copied to clipboard