modelmesh-serving icon indicating copy to clipboard operation
modelmesh-serving copied to clipboard

ServingRuntime autoscaling monitoring GPU utilization

Open andreapairon opened this issue 1 year ago • 6 comments

Hi all,

I noticed in the scaling doc page (https://github.com/kserve/modelmesh-serving/blob/main/docs/production-use/scaling.md) that now is possible to set the ServingRuntime autoscaling with HPA, but using metrics based on cpu utilization. Is it possible to scale the ServingRuntime using metrics regarding GPU?

Thanks in advance

andreapairon avatar May 15 '23 10:05 andreapairon

Not currently.

https://github.com/kserve/modelmesh-serving/issues/329#issuecomment-1442742323

// Autoscaler Metrics Allowed List
var AutoscalerAllowedMetricsList = []AutoscalerMetricsType{
	AutoScalerMetricsCPU,
	AutoScalerMetricsMemory,
}

FYI @Jooho @njhill

ckadner avatar May 24 '23 02:05 ckadner

@andreapairon As @ckadner mentioned, it can not use GPU-related metrics. HPA is a default object provided by kubernetes, and ModelMesh relies on this HPA object to autoscale the ServingRuntime Pods.

Jooho avatar May 25 '23 13:05 Jooho

@Jooho @ckadner, according to the HPA docs, since k8s v1.23 the autoscaling/v2 API supports custom metrics. I'm not sure what would be involved in exposing this through modelmesh, however.

dagrayvid avatar May 26 '23 00:05 dagrayvid

@dagrayvid I will look into this soon

Jooho avatar May 29 '23 22:05 Jooho

@Jooho Is there any new information regarding that question? We would also like to use custom metrics that are not related to cpu or memory

GolanLevy avatar Sep 06 '23 07:09 GolanLevy

@GolanLevy at the moment, modelmesh is only supporting cpu/memory only. But the custom metrics are supported by HPA so it would be enhanced. Contribution is always welcome.

Jooho avatar Oct 02 '23 12:10 Jooho