modelmesh-serving How to Control the Number of Model Replicas in ModelMesh Serving

Description

I am working with ModelMesh Serving deployed on a Kubernetes cluster and I am looking for a way to control the number of replicas for a specific model. My setup includes a Triton runtime with two pods, and I'm serving a model mobilenet. I aim to ensure that the model replicas can be configured to a specific number.

Cluster State:

The state of pods in my cluster is as follows:

NAME                                           READY   STATUS    RESTARTS   AGE
etcd-bcc445f46-gnmw6                           1/1     Running   0          2d21h
minio-67577699d-frm4s                          1/1     Running   0          2d21h
modelmesh-controller-5fd6b98c4f-h4njm          1/1     Running   0          65s
modelmesh-serving-triton-2.x-9849f97c6-54gh7   4/4     Running   0          18s
modelmesh-serving-triton-2.x-9849f97c6-qndvd   4/4     Running   0          18s
traefik-78db748568-cmn4x                       1/1     Running   0          2d21h

Inference service status

NAME                     URL                                               READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION   AGE
example-mobilenet-isvc   grpc://modelmesh-serving.modelmesh-serving:8033   True
                             40s

The InferenceService for mobilenet (example-mobilenet-isvc) has minReplicas set to 2, as shown in the description below:

Name:         example-mobilenet-isvc
Namespace:    modelmesh-serving
Labels:       <none>
Annotations:  serving.kserve.io/deploymentMode: ModelMesh
API Version:  serving.kserve.io/v1beta1
Kind:         InferenceService
Metadata:
  Creation Timestamp:  2024-04-18T03:01:25Z
  Generation:          1
  Resource Version:    454691
  UID:                 f34abe33-606f-4fbd-95e4-a67829f7dac0
Spec:
  Predictor:
    Min Replicas:  2
    Model:
      Model Format:
        Name:   onnx
      Runtime:  triton-2.x
      Storage:
        Key:  minio
        Parameters:
          Bucket:  modelmesh-serving
        Path:      mobilenetv2-7.onnx
Status:
  Components:
    Predictor:
      Grpc URL:  grpc://modelmesh-serving.modelmesh-serving:8033
      Rest URL:  http://modelmesh-serving.modelmesh-serving:8008
      URL:       grpc://modelmesh-serving.modelmesh-serving:8033
  Conditions:
    Last Transition Time:  2024-04-18T03:01:40Z
    Status:                True
    Type:                  PredictorReady
    Last Transition Time:  2024-04-18T03:01:40Z
    Status:                True
    Type:                  Ready
  Model Status:
    Copies:
      Failed Copies:  0
      Total Copies:   1
    States:
      Active Model State:  Loaded
      Target Model State:
    Transition Status:     UpToDate
  URL:                     grpc://modelmesh-serving.modelmesh-serving:8033
Events:                    <none>

ETCD Keys and Values:

Relevant data from ETCD suggests only one replica is active for the model as per the instanceIds and count:

{"hostname":"10.244.0.174","instanceId":"9f97c6-qndvd","port":8080,"version":"20230801-7b484","registrationTime":1713409288745,"connConfig":{"transport.tprotocol.factory":"org.apache.thrift.protocol.TCompactProtocol$Factory","transport.framed":"false","transport.ssl.enabled":"false","transport.extrainfo_supported":"true","service.class":"com.ibm.watson.modelmesh.thrift.ModelMeshService","methodinfo.applyModelMulti":"idp=t","methodinfo.applyModel":"idp=t","app.kv_store_type":"etcd"}}
/litelinks/modelmesh-serving/10.244.0.175_8080_18eef26ea30
{"hostname":"10.244.0.175","instanceId":"9f97c6-54gh7","port":8080,"version":"20230801-7b484","registrationTime":1713409288755,"connConfig":{"transport.tprotocol.factory":"org.apache.thrift.protocol.TCompactProtocol$Factory","transport.framed":"false","transport.ssl.enabled":"false","transport.extrainfo_supported":"true","service.class":"com.ibm.watson.modelmesh.thrift.ModelMeshService","methodinfo.applyModel":"idp=t","methodinfo.applyModelMulti":"idp=t","app.kv_store_type":"etcd"}}
/mm/modelmesh-serving/instances/9f97c6-54gh7
{"startTime":1713409287610,"loc":"172.18.0.2","labels":["mt:keras","mt:keras:2","mt:onnx","mt:onnx:1","mt:pytorch","mt:pytorch:1","mt:tensorflow","mt:tensorflow:1","mt:tensorflow:2","mt:tensorrt","mt:tensorrt:7","pv:grpc-v2","pv:v2","rt:triton-2.x"],"actionable":true,"lruTime":1713407522245,"count":1,"cap":48661,"used":123,"lThreads":2,"lInProg":1}
/mm/modelmesh-serving/instances/9f97c6-qndvd
{"startTime":1713409287621,"loc":"172.18.0.2","labels":["mt:keras","mt:keras:2","mt:onnx","mt:onnx:1","mt:pytorch","mt:pytorch:1","mt:tensorflow","mt:tensorflow:1","mt:tensorflow:2","mt:tensorrt","mt:tensorrt:7","pv:grpc-v2","pv:v2","rt:triton-2.x"],"actionable":true,"lruTime":1713407522368,"count":1,"cap":48661,"used":2174,"lThreads":2}
/mm/modelmesh-serving/leaderLatch/_9f97c6-54gh7
_9f97c6-54gh7
/mm/modelmesh-serving/leaderLatch/_9f97c6-qndvd
_9f97c6-qndvd
/mm/modelmesh-serving/registry/example-mobilenet-isvc__isvc-0b5941bbd0
{"type":"rt:triton-2.x","encKey":"{\"storage_key\":\"minio\",\"storage_params\":{\"bucket\":\"modelmesh-serving\"},\"model_type\":{\"name\":\"onnx\"}}","mPath":"mobilenetv2-7.onnx","autoDel":true,"instanceIds":{"9f97c6-qndvd":1713409297527},"refs":1,"lu":1713407522368}
/mm/modelmesh-serving/vmodels/example-mobilenet-isvc
{"o":"isvc","amid":"example-mobilenet-isvc__isvc-0b5941bbd0","tmid":"example-mobilenet-isvc__isvc-0b5941bbd0"}

Question:

How can one ensure that ModelMesh Serving adheres to the minReplicas configuration for a specific model? The documentation does not seem to discuss in depth about scaling individual model replicas across the serving pods. Is there a way to control the model replicas in modelmesh serving?

Apr 16 '24 05:04 michael-nammi

Hi, @michael-nammi, have you found the solution ?

Jul 21 '24 13:07 haiminh2001

I found this doc from the model-mesh repository. Hope this will help.

Sep 04 '24 07:09 haiminh2001

modelmesh-serving modelmesh-serving copied to clipboard

How to Control the Number of Model Replicas in ModelMesh Serving

Description

Cluster State:

Inference service status

ETCD Keys and Values:

Question:

modelmesh-serving
modelmesh-serving copied to clipboard