modelmesh-serving ServingRuntime autoscaling monitoring GPU utilization

ServingRuntime autoscaling monitoring GPU utilization

Open andreapairon opened this issue 1 year ago • 6 comments

Hi all,

I noticed in the scaling doc page (https://github.com/kserve/modelmesh-serving/blob/main/docs/production-use/scaling.md) that now is possible to set the ServingRuntime autoscaling with HPA, but using metrics based on cpu utilization. Is it possible to scale the ServingRuntime using metrics regarding GPU?

Thanks in advance

May 15 '23 10:05 andreapairon

Not currently.

https://github.com/kserve/modelmesh-serving/issues/329#issuecomment-1442742323

// Autoscaler Metrics Allowed List
var AutoscalerAllowedMetricsList = []AutoscalerMetricsType{
	AutoScalerMetricsCPU,
	AutoScalerMetricsMemory,
}

FYI @Jooho @njhill

May 24 '23 02:05 ckadner

@andreapairon As @ckadner mentioned, it can not use GPU-related metrics. HPA is a default object provided by kubernetes, and ModelMesh relies on this HPA object to autoscale the ServingRuntime Pods.

May 25 '23 13:05 Jooho

@Jooho @ckadner, according to the HPA docs, since k8s v1.23 the autoscaling/v2 API supports custom metrics. I'm not sure what would be involved in exposing this through modelmesh, however.

May 26 '23 00:05 dagrayvid

@dagrayvid I will look into this soon

May 29 '23 22:05 Jooho

@Jooho Is there any new information regarding that question? We would also like to use custom metrics that are not related to cpu or memory

Sep 06 '23 07:09 GolanLevy

@GolanLevy at the moment, modelmesh is only supporting cpu/memory only. But the custom metrics are supported by HPA so it would be enhanced. Contribution is always welcome.

Oct 02 '23 12:10 Jooho

modelmesh-serving modelmesh-serving copied to clipboard

ServingRuntime autoscaling monitoring GPU utilization

modelmesh-serving
modelmesh-serving copied to clipboard