modelmesh-serving icon indicating copy to clipboard operation
modelmesh-serving copied to clipboard

How to use ServingRuntime autoscaling?

Open andreapairon opened this issue 1 year ago • 8 comments

Hi all,

which is the way to correctly use the HPA autoscaling on ServingRuntime? Should I remove the replicas property under spec ? Should I update all the YAML files involved in the "Enable HPA..." commit or use another version of the ModelMesh controller image?

It's not very easy to understand how to use the autoscaling from the scaling documentation page.

andreapairon avatar May 17 '23 13:05 andreapairon

@Jooho -- there may be a need to update our docs to clear up some of the confusion :-)

ckadner avatar May 24 '23 02:05 ckadner

Thank you. In the meantime...can you tell me how to activate the serving runtimes autoscaling? :D

andreapairon avatar May 24 '23 08:05 andreapairon

@andreapairon

I'm sorry for causing confusion and thank you for providing the questions. I will answer each of the questions you have raised.

which is the way to correctly use the HPA autoscaling on ServingRuntime?

In order to enable HPA, you can add this annotation for the specific ServingRuntime. This is an example:

apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
  annotations:
    serving.kserve.io/autoscalerClass: hpa

Should I remove the replicas property under spec ?

Correct. If you want to enable HPA, you have to remove the replicas from ServingRuntime spec.

Should I update all the YAML files involved in the "Enable HPA..." commit or use another version of the ModelMesh controller image?

HPA uses webhook so you have to update all yaml files

Additional Comments: By default, HPA-specific features are managed through annotations in ServingRuntime, which is different from kserve/kserve being managed through annotations or predictor specs in inference service. This is because by design, multiple models in a kserve/modelmesh share a single ServingRuntime. HPA is a default object provided by kubernetes, and ModelMesh relies on this HPA object to autoscale the ServingRuntime Pods.

If you have further questions, please let me know.

Jooho avatar May 25 '23 13:05 Jooho

@andreapairon -- when you got a chance to try it out, would you be willing to open a PR to update our docs?

ckadner avatar May 26 '23 00:05 ckadner

@ckadner --- yeah, I'll do.

@Jooho --- But to enable HPA, is it necessary the KNative installation as well? Or the standalone installation of KServe ModelMesh Serving is enough?

andreapairon avatar May 30 '23 14:05 andreapairon

@andreapairon No it does not need knative installation but the cluster has to support metrics.

Jooho avatar Jun 01 '23 13:06 Jooho

We have to update the go code for the latest Kubernetes version, OpenShift v4.13 and K8s v1.26 no beta2 version of HPA (deprecated)

W0713 21:57:33.276930       1 warnings.go:70] autoscaling/v2beta2 HorizontalPodAutoscaler is deprecated in v1.23+, unavailable in v1.26+; use autoscaling/v2 HorizontalPodAutoscaler

ckadner avatar Jun 20 '23 20:06 ckadner

We have to update the go code for the latest Kubernetes version, OpenShift v4.13 and K8s v1.26 no beta2 version of HPA (deprecated)

W0713 21:57:33.276930       1 warnings.go:70] autoscaling/v2beta2 HorizontalPodAutoscaler is deprecated in v1.23+, unavailable in v1.26+; use autoscaling/v2 HorizontalPodAutoscaler

That's completed now with #403, but this issue is more about better documenting how to use autoscaling right? Maybe we can either open a new issue re: improving autoscaling documentation, or repurpose/rename this one to better capture that.

rafvasq avatar Jul 20 '23 16:07 rafvasq