modelmesh-serving
modelmesh-serving copied to clipboard
ServingRuntime autoscaling monitoring GPU utilization
Hi all,
I noticed in the scaling doc page (https://github.com/kserve/modelmesh-serving/blob/main/docs/production-use/scaling.md) that now is possible to set the ServingRuntime
autoscaling with HPA
, but using metrics based on cpu utilization.
Is it possible to scale the ServingRuntime
using metrics regarding GPU?
Thanks in advance
Not currently.
https://github.com/kserve/modelmesh-serving/issues/329#issuecomment-1442742323
// Autoscaler Metrics Allowed List
var AutoscalerAllowedMetricsList = []AutoscalerMetricsType{
AutoScalerMetricsCPU,
AutoScalerMetricsMemory,
}
FYI @Jooho @njhill
@andreapairon As @ckadner mentioned, it can not use GPU-related metrics. HPA is a default object provided by kubernetes, and ModelMesh relies on this HPA object to autoscale the ServingRuntime Pods.
@Jooho @ckadner, according to the HPA docs, since k8s v1.23 the autoscaling/v2 API supports custom metrics. I'm not sure what would be involved in exposing this through modelmesh, however.
@dagrayvid I will look into this soon
@Jooho Is there any new information regarding that question? We would also like to use custom metrics that are not related to cpu or memory
@GolanLevy at the moment, modelmesh is only supporting cpu/memory only. But the custom metrics are supported by HPA so it would be enhanced. Contribution is always welcome.