modelmesh-serving
modelmesh-serving copied to clipboard
How to Control the Number of Model Replicas in ModelMesh Serving
Description
I am working with ModelMesh Serving deployed on a Kubernetes cluster and I am looking for a way to control the number of replicas for a specific model. My setup includes a Triton runtime with two pods, and I'm serving a model mobilenet. I aim to ensure that the model replicas can be configured to a specific number.
Cluster State:
The state of pods in my cluster is as follows:
NAME READY STATUS RESTARTS AGE
etcd-bcc445f46-gnmw6 1/1 Running 0 2d21h
minio-67577699d-frm4s 1/1 Running 0 2d21h
modelmesh-controller-5fd6b98c4f-h4njm 1/1 Running 0 65s
modelmesh-serving-triton-2.x-9849f97c6-54gh7 4/4 Running 0 18s
modelmesh-serving-triton-2.x-9849f97c6-qndvd 4/4 Running 0 18s
traefik-78db748568-cmn4x 1/1 Running 0 2d21h
Inference service status
NAME URL READY PREV LATEST PREVROLLEDOUTREVISION LATESTREADYREVISION AGE
example-mobilenet-isvc grpc://modelmesh-serving.modelmesh-serving:8033 True
40s
The InferenceService for mobilenet (example-mobilenet-isvc) has minReplicas set to 2, as shown in the description below:
Name: example-mobilenet-isvc
Namespace: modelmesh-serving
Labels: <none>
Annotations: serving.kserve.io/deploymentMode: ModelMesh
API Version: serving.kserve.io/v1beta1
Kind: InferenceService
Metadata:
Creation Timestamp: 2024-04-18T03:01:25Z
Generation: 1
Resource Version: 454691
UID: f34abe33-606f-4fbd-95e4-a67829f7dac0
Spec:
Predictor:
Min Replicas: 2
Model:
Model Format:
Name: onnx
Runtime: triton-2.x
Storage:
Key: minio
Parameters:
Bucket: modelmesh-serving
Path: mobilenetv2-7.onnx
Status:
Components:
Predictor:
Grpc URL: grpc://modelmesh-serving.modelmesh-serving:8033
Rest URL: http://modelmesh-serving.modelmesh-serving:8008
URL: grpc://modelmesh-serving.modelmesh-serving:8033
Conditions:
Last Transition Time: 2024-04-18T03:01:40Z
Status: True
Type: PredictorReady
Last Transition Time: 2024-04-18T03:01:40Z
Status: True
Type: Ready
Model Status:
Copies:
Failed Copies: 0
Total Copies: 1
States:
Active Model State: Loaded
Target Model State:
Transition Status: UpToDate
URL: grpc://modelmesh-serving.modelmesh-serving:8033
Events: <none>
ETCD Keys and Values:
Relevant data from ETCD suggests only one replica is active for the model as per the instanceIds and count:
{"hostname":"10.244.0.174","instanceId":"9f97c6-qndvd","port":8080,"version":"20230801-7b484","registrationTime":1713409288745,"connConfig":{"transport.tprotocol.factory":"org.apache.thrift.protocol.TCompactProtocol$Factory","transport.framed":"false","transport.ssl.enabled":"false","transport.extrainfo_supported":"true","service.class":"com.ibm.watson.modelmesh.thrift.ModelMeshService","methodinfo.applyModelMulti":"idp=t","methodinfo.applyModel":"idp=t","app.kv_store_type":"etcd"}}
/litelinks/modelmesh-serving/10.244.0.175_8080_18eef26ea30
{"hostname":"10.244.0.175","instanceId":"9f97c6-54gh7","port":8080,"version":"20230801-7b484","registrationTime":1713409288755,"connConfig":{"transport.tprotocol.factory":"org.apache.thrift.protocol.TCompactProtocol$Factory","transport.framed":"false","transport.ssl.enabled":"false","transport.extrainfo_supported":"true","service.class":"com.ibm.watson.modelmesh.thrift.ModelMeshService","methodinfo.applyModel":"idp=t","methodinfo.applyModelMulti":"idp=t","app.kv_store_type":"etcd"}}
/mm/modelmesh-serving/instances/9f97c6-54gh7
{"startTime":1713409287610,"loc":"172.18.0.2","labels":["mt:keras","mt:keras:2","mt:onnx","mt:onnx:1","mt:pytorch","mt:pytorch:1","mt:tensorflow","mt:tensorflow:1","mt:tensorflow:2","mt:tensorrt","mt:tensorrt:7","pv:grpc-v2","pv:v2","rt:triton-2.x"],"actionable":true,"lruTime":1713407522245,"count":1,"cap":48661,"used":123,"lThreads":2,"lInProg":1}
/mm/modelmesh-serving/instances/9f97c6-qndvd
{"startTime":1713409287621,"loc":"172.18.0.2","labels":["mt:keras","mt:keras:2","mt:onnx","mt:onnx:1","mt:pytorch","mt:pytorch:1","mt:tensorflow","mt:tensorflow:1","mt:tensorflow:2","mt:tensorrt","mt:tensorrt:7","pv:grpc-v2","pv:v2","rt:triton-2.x"],"actionable":true,"lruTime":1713407522368,"count":1,"cap":48661,"used":2174,"lThreads":2}
/mm/modelmesh-serving/leaderLatch/_9f97c6-54gh7
_9f97c6-54gh7
/mm/modelmesh-serving/leaderLatch/_9f97c6-qndvd
_9f97c6-qndvd
/mm/modelmesh-serving/registry/example-mobilenet-isvc__isvc-0b5941bbd0
{"type":"rt:triton-2.x","encKey":"{\"storage_key\":\"minio\",\"storage_params\":{\"bucket\":\"modelmesh-serving\"},\"model_type\":{\"name\":\"onnx\"}}","mPath":"mobilenetv2-7.onnx","autoDel":true,"instanceIds":{"9f97c6-qndvd":1713409297527},"refs":1,"lu":1713407522368}
/mm/modelmesh-serving/vmodels/example-mobilenet-isvc
{"o":"isvc","amid":"example-mobilenet-isvc__isvc-0b5941bbd0","tmid":"example-mobilenet-isvc__isvc-0b5941bbd0"}
Question:
How can one ensure that ModelMesh Serving adheres to the minReplicas configuration for a specific model? The documentation does not seem to discuss in depth about scaling individual model replicas across the serving pods. Is there a way to control the model replicas in modelmesh serving?
Hi, @michael-nammi, have you found the solution ?
I found this doc from the model-mesh repository. Hope this will help.