modelmesh-serving
modelmesh-serving copied to clipboard
"Specified runtime name not recognised" error in Inference Service
Describe the bug
When I create 1 custom serving runtime (SR1) and 1 inference service (IS1) that is to loaded onto SR1, I observe the expected behaviour whereby IS1 will have the failure message Waiting for runtime Pod to become available
while the serving runtime is scaling up. Once the serving runtime pod is fully running, IS1 will immediately be loaded and be in Ready
state.
However, if I create a second serving runtime (SR2) and a second inference service (IS2) which is loaded onto SR2, what I observe is that IS2 will have a failure message saying that "Specified runtime name not recognized
" while SR2 is in the midst of scaling up. Once the SR2 pod has all containers in Running state, IS2 does not immediately get loaded. Instead, it will only get loaded after a few minutes when the ModelMesh controller's predictor reconciler picks it up again. This behaviour is unexpected as I would think that SR2 and IS2 should follow the same behaviour as SR1 and IS1.
To Reproduce
Serving Runtime and Inference Service templates used when observing this error:
apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
name: mlserver-1
spec:
builtInAdapter:
memBufferBytes: 134217728
modelLoadingTimeoutMillis: 90000
runtimeManagementPort: 8001
serverType: mlserver
containers:
- env:
- name: MLSERVER_MODELS_DIR
value: /models/
- name: MLSERVER_GRPC_PORT
value: "8001"
- name: MLSERVER_HTTP_PORT
value: "8002"
- name: MLSERVER_LOAD_MODELS_AT_STARTUP
value: "false"
- name: MLSERVER_MODEL_NAME
value: dummy-model
- name: MLSERVER_HOST
value: 127.0.0.1
- name: MLSERVER_GRPC_MAX_MESSAGE_LENGTH
value: "16777216"
- name: MLSERVER_DEBUG
value: "true"
- name: MLSERVER_MODEL_PARALLEL_WORKERS
value: "0"
- name: MLSERVER_MODEL_IMPLEMENTATION
value: mlserver_sklearn.SKLearnModel
image: seldonio/mlserver:1.2.3
name: mlserver
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: "2"
memory: 1Gi
grpcDataEndpoint: port:8001
grpcEndpoint: port:8085
multiModel: true
replicas: 1
storageHelper:
disabled: false
supportedModelFormats:
- autoSelect: true
name: custom
version: "1"
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
annotations:
serving.kserve.io/deploymentMode: ModelMesh
serving.kserve.io/secretKey: default
name: sklearn-1
spec:
predictor:
model:
modelFormat:
name: custom
version: "1"
name: ""
resources: {}
runtime: mlserver-1
storageUri: <s3 uri>
The model used for the inference service is the SKLearn model trained on the MNIST dataset that is documented here.
Steps to reproduce the behavior:
- Within the same namespace, create 1 serving runtime with the above template. Then create 1 inference service with the above template after the serving runtime deployment has been created.
- Wait for the serving runtime pod to have all containers in Running state and the inference service to be in Ready state.
- Within the same namespace, create another serving runtime with the same template. Update the
spec.predictor.model.runtime
value in the inference service template to use the new serving runtime and create a new inference service. - The
Specified runtime name not recognized
error should appear in the status field of the new inference service.
Expected behavior
When the serving runtime is scaling up, the new inference service should be showing the failure message Waiting for runtime Pod to become available
with reason RuntimeUnhealthy
instead of the message Specified runtime name not recognized
and reason RuntimeNotRecognized
. The inference service should also be updated and loaded immediately after the serving runtime has scaled up rather than a couple of minutes after the serving runtime pod is ready.
Screenshots
Nil
Environment (please complete the following information):
- OS: [e.g. iOS] MacOS
- Browser [e.g. chrome, safari] Nil
- Version [e.g. 22] KServe v0.9.0, Kubernetes v1.24.10
ModelMesh (v0.9.0) was installed on a fresh cluster using the --quickstart
option.
Additional context
Some observations that might be related to the problem:
- When I monitor the contents of etcd during the serving runtime and inference service creation, for the creation of SR1 and IS1, I noticed that the
instances
record was created before thevmodels
andregistry
records. However, for the creation of SR2 and IS2, I noticed that thevmodels
andregistry
records were created the moment the inference service was created, but theinstances
record was only created later after the SR2 pod was getting to Running state. Could this be why the controller could not find any running instances of SR2 for IS2 and it led to theSpecified runtime name not recognised
error? - When the serving runtimes and inference services are deleted from the cluster, I noticed that if I delete the inference service first before deleting the serving runtime, the corresponding etcd records are fully deleted. However, if I delete the serving runtime first, then delete the inference service after, the
vmodels
andregistry
records for the inference service are not deleted from etcd. Is this behaviour expected?
Thanks again for opening up a very detailed issue!
In general, I don't think that the controller is hardened for a use-case where many SRs are created dynamically with ISVCs. The typical use is (1) Define all of the SRs that you plan to use (2) Create ISVCs as needed. I even consider scaleToZero
to be mainly useful for dev/test in a resource constrained environment.
The boot up differences for SR1+ISVC1 versus SR2+ISVC2 can be explained by the behavior of scaleToZero
(which is on by default). After an initial install without any ISVCs there will be no Runtime pods up which means that there are no ModelMesh containers running, effectively the ModelMesh cluster does not yet exist. Creation of the first ISVC will cause the ServingRuntime to be scaled up and pods to spin up. Until one of the pods come up, the ISVC cannot be registered because there is no MM container to connect to. This forces an ordering: ServingRuntime first, ISVC registration second.
When SR2+ISVC2 are created, the ISVC2 can be registered using the existing MM containers in SR1. From your observations, the registration of SR2 happens only after a pod is able to spin up, so there is time period when the ISVC is registered to use a Runtime that does not yet exists, which results in the Specified runtime name not recognised
error.
So the fix would be to have the controller recognize that a ServingRuntime exists without having to wait for the pods to come up. I think the current behavior is that the controller just registers the ISVC with ModelMesh and grabs the error from ModelMesh, and ModelMesh doesn't know about the new Runtime until the pods come up.
The second behavior that you observed is more interesting:
However, if I delete the serving runtime first, then delete the inference service after, the vmodels and registry records for the inference service are not deleted from etcd. Is this behaviour expected?
I'll have to look into that. I think it is expected since the registration of the ISVC is somewhat independent from the existence of a Runtime. It seems a bit strange that deleting the ISVC wouldn't cause the cleanup, but I think all of the data in etcd has an expiry, so the entries will eventually be removed. It might lead to weirdness if you recreated the Runtime before the entries expire 🤔. Actually, does this happen only when deleting the last ServingRuntime? or can you delete SR2 while SR1 is still up and still observe this behavior for ISVC2? If it is only when removing both ServingRuntimes, I think the problem is that the entries in etcd are only removed by a request to a ModelMesh container, so the controller can't clean up an existing ISVC correctly unless there is an SR pod still up.
@tjohnson31415 Thanks for the explanation! That was really helpful to understand how the controller is handling the registration of serving runtimes and inference services.
Unfortunately, my use case of ModelMesh requires that SRs can be created dynamically with ISVCs, and that the ISVCs should be loaded onto the SRs as quickly as possible. From my observations so far, it can take approximately 5min after the SR pod has come up fully before the ISVC gets reconciled and goes into Ready
state, and this lag between the SR pod being up and the ISVC being ready is the main issue for me. If it would be too much trouble to change the behaviour of the controller to recognise that a SR exists without having to wait for the pods to come up, do you know if there is a way to shorten this lag time and allow the ISVC to reach Ready
state more immediately after the SR pod is up?
Otherwise, to enable the controller to recognise that a SR exists without having to wait for the pods to come up, do you think it could be possible for the tc-config
ConfigMap to be used to determine if an SR exists? I noticed that the type_constraints
within this ConfigMap gets updated immediately after the creation of the SR resource without having to wait for the SR pod to scale up, even when scaleToZero
is enabled, so I was wondering if it could be of use.
As for the second issue about the deletion of the etcd records, I have attempted to delete SR2 while SR1 is still up, then delete ISVC2 after that, and the vmodels
and registry
records for ISVC2 are indeed removed successfully. So it seems like you are most likely right about the cause of this issue.
Hi, may I know if there are any intentions to address this issue (ie. modify the controller's behaviour to have it recognise that a ServingRuntime exists without having to wait for the pods to come up)?