serve Improve kserve protocol version handling

trafficstars

Description

This PR ensures that a default value for the KServe protocol is passed forward to the ModelServer class and that the class itself uses a default value if instantiated directly.

With the current implementation, if the PROTOCOL_VERSION environment variable is not set, the protocol property of the ModelServer instance will be as well and this will trigger an exception when on data return after processing as the base class does a check on its value.

Fixes #2956

Type of change

Please delete options that are not relevant.

[X] Bug fix (non-breaking change which fixes an issue)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ ] New feature (non-breaking change which adds functionality)
[ ] This change requires a documentation update

Feature/Issue validation/testing

A new test has been added to ensure that:

If the PROTOCOL_VERSION environment variable is not set, a default is used
If the PROTOCOL_VERSION environment variable is set to a valid value, it's indeed that one that is used
If the PROTOCOL_VERSION environment variable is set to an invalid value, the wrapper fails to start

Checklist:

[X] Did you have fun?
[X] Have you added tests that prove your fix is effective or that this feature works?
[X] Has code been commented, particularly in hard-to-understand areas?
[X] Have you made corresponding changes to the documentation?

Feb 22 '24 17:02 sgaist

I currently don't know why the changed file linting failed as I installed the pre-commit hooks and just to be sure I manually ran them on the files I changed and they where all deemed clean.

Feb 26 '24 09:02 sgaist

@sgaist Can you please mention in the PR what the current issue and what this PR is addressing

Feb 27 '24 02:02 agunapal

@agunapal Description fixed !

Sorry for the wrong content, I had it opened in parallel to the ticket and thought I had updated the description before saving.

Feb 27 '24 13:02 sgaist

@agunapal I just realized that in order for the test to run properly, kserve must be installed and based on my experience the kserve[storage] variant. While adding it to developer.txt, I got hit by an incompatible requirement: serve is using 4.25.1 while kserve<4.0.0.

What would you recommend to go forward ? Isolate the test and create a dedicate flow for it ?

Feb 27 '24 20:02 sgaist

@agunapal I just realized that in order for the test to run properly, kserve must be installed and based on my experience the kserve[storage] variant. While adding it to developer.txt, I got hit by an incompatible requirement: serve is using 4.25.1 while kserve<4.0.0.

What would you recommend to go forward ? Isolate the test and create a dedicate flow for it ?

Yes, I'm not sure if we want to do this.

Can you please check if you can add a test here https://github.com/pytorch/serve/blob/master/kubernetes/kserve/tests/scripts/test_mnist.sh

Feb 27 '24 21:02 agunapal

I think the overhead is not worth it.

After having fiddled a bit more with test_mnist.sh, I can suggest three solutions:

Set the kserve version to v0.11.0 (which is the last one before they implemented systematic definition of the PROTOCOL_VERSION environment variable) in kserve_cpu_tests.yml.
Modify test_mnist.sh to deploy kserve v0.11.0 after the current set of tests and re-do a run of mnist_v1_cpu
Make a matrix of the KServe versions to test against.

Number one has the downside of using an outdated version of KServe. Number two makes the tests last a bit longer but does not preclude from updating the KServe version used for all other tests. Number three will multiply the test duration by the amount of KServe version tested but allows to confirm that everything works properly on them. This also means that test_mnist.sh does not need to be modified.

The options are listed in ascending order in terms of resource/time impact.

Which one would you like to have ?

Feb 28 '24 13:02 sgaist

I think the overhead is not worth it.

After having fiddled a bit more with test_mnist.sh, I can suggest three solutions:

Set the kserve version to v0.11.0 (which is the last one before they implemented systematic definition of the PROTOCOL_VERSION environment variable) in kserve_cpu_tests.yml.

Modify test_mnist.sh to deploy kserve v0.11.0 after the current set of tests and re-do a run of mnist_v1_cpu

Make a matrix of the KServe versions to test against.

Number one has the downside of using an outdated version of KServe. Number two makes the tests last a bit longer but does not preclude from updating the KServe version used for all other tests. Number three will multiply the test duration by the amount of KServe version tested but allows to confirm that everything works properly on them. This also means that test_mnist.sh does not need to be modified.

The options are listed in ascending order in terms of resource/time impact.

Which one would you like to have ?

Thanks for looking into this. If this is already implemented in the latest version, I think you can manually run a test with 0.11.0 version of kserve and attach the logs.

FYI, we are making a release soon. So, we will review this after the release.

Also curious, if this is resolved in the latest version of kserve, can't you use that?

Feb 29 '24 01:02 agunapal

KServe 0.11.0 run for protocol v2 prior to this PR

Check status of the pod NAME READY STATUS RESTARTS AGE torchserve-mnist-v2-predictor-00001-deployment-9569d6f8b-8w6pb 1/2 Running 0 2m Name: torchserve-mnist-v2-predictor-00001-deployment-9569d6f8b-8w6pb Namespace: default Priority: 0 Service Account: default Node: minikube/192.168.49.2 Start Time: Thu, 29 Feb 2024 10:26:12 +0100 Labels: app=torchserve-mnist-v2-predictor-00001 component=predictor pod-template-hash=9569d6f8b service.istio.io/canonical-name=torchserve-mnist-v2-predictor service.istio.io/canonical-revision=torchserve-mnist-v2-predictor-00001 serviceEnvelope=kservev2 serving.knative.dev/configuration=torchserve-mnist-v2-predictor serving.knative.dev/configurationGeneration=1 serving.knative.dev/configurationUID=a40d926b-2a09-4d00-9aa5-34e09fed7608 serving.knative.dev/revision=torchserve-mnist-v2-predictor-00001 serving.knative.dev/revisionUID=cdff5033-616f-454c-a27c-03984823cd74 serving.knative.dev/service=torchserve-mnist-v2-predictor serving.knative.dev/serviceUID=f04ce05a-78a7-44ed-8473-3cf46fb117b6 serving.kserve.io/inferenceservice=torchserve-mnist-v2 Annotations: autoscaling.knative.dev/class: kpa.autoscaling.knative.dev autoscaling.knative.dev/min-scale: 1 internal.serving.kserve.io/storage-initializer-sourceuri: gs://kfserving-examples/models/torchserve/image_classifier/v2 prometheus.kserve.io/path: /metrics prometheus.kserve.io/port: 8082 serving.knative.dev/creator: system:serviceaccount:kserve:kserve-controller-manager serving.kserve.io/enable-metric-aggregation: false serving.kserve.io/enable-prometheus-scraping: false Status: Running IP: 10.244.0.17 IPs: IP: 10.244.0.17 Controlled By: ReplicaSet/torchserve-mnist-v2-predictor-00001-deployment-9569d6f8b Init Containers: storage-initializer: Container ID: docker://6f8e446c5344730eea59403bf3d9b3307fe96c3d8df8c6cdcd63773a852aec84 Image: kserve/storage-initializer:v0.11.0 Image ID: docker-pullable://kserve/storage-initializer@sha256:962682077dc30c21113822f50b63400d759ce6c49518d0c9caa638b6f77c7fed Port: Host Port: Args: gs://kfserving-examples/models/torchserve/image_classifier/v2 /mnt/models State: Terminated Reason: Completed Exit Code: 0 Started: Thu, 29 Feb 2024 10:26:39 +0100 Finished: Thu, 29 Feb 2024 10:26:56 +0100 Ready: True Restart Count: 0 Limits: cpu: 1 memory: 1Gi Requests: cpu: 100m memory: 100Mi Environment: Mounts: /mnt/models from kserve-provision-location (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8wbm2 (ro) Containers: kserve-container: Container ID: docker://cc99e7eecaf2c842fcea9cc30fb301e6427a4cfa259f7b42cb0806e6a92ea1e0 Image: index.docker.io/pytorch/torchserve-kfs-nightly@sha256:5ec33cf3085b56b5eedb6b11c67faa5b7e669019573a26459eb49a2394531be2 Image ID: docker-pullable://pytorch/torchserve-kfs-nightly@sha256:5ec33cf3085b56b5eedb6b11c67faa5b7e669019573a26459eb49a2394531be2 Port: 8080/TCP Host Port: 0/TCP Args: torchserve --start --model-store=/mnt/models/model-store --ts-config=/mnt/models/config/config.properties State: Running Started: Thu, 29 Feb 2024 10:28:02 +0100 Ready: True Restart Count: 0 Limits: cpu: 1 memory: 1Gi Requests: cpu: 100m memory: 256Mi Environment: TS_SERVICE_ENVELOPE: kservev2 PORT: 8080 K_REVISION: torchserve-mnist-v2-predictor-00001 K_CONFIGURATION: torchserve-mnist-v2-predictor K_SERVICE: torchserve-mnist-v2-predictor Mounts: /mnt/models from kserve-provision-location (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8wbm2 (ro) queue-proxy: Container ID: docker://ad160b0a42f80dc14996c9acf54185125788bca57ec79eb36ca9193823a37521 Image: gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:65c427aaab3be9cea1afea32cdef26d5855c69403077d2dc3439f75c26a1e83f Image ID: docker-pullable://gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:65c427aaab3be9cea1afea32cdef26d5855c69403077d2dc3439f75c26a1e83f Ports: 8022/TCP, 9090/TCP, 9091/TCP, 8012/TCP, 8112/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP State: Running Started: Thu, 29 Feb 2024 10:28:07 +0100 Ready: False Restart Count: 0 Requests: cpu: 25m Readiness: http-get http://:8012/ delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: SERVING_NAMESPACE: default SERVING_SERVICE: torchserve-mnist-v2-predictor SERVING_CONFIGURATION: torchserve-mnist-v2-predictor SERVING_REVISION: torchserve-mnist-v2-predictor-00001 QUEUE_SERVING_PORT: 8012 QUEUE_SERVING_TLS_PORT: 8112 CONTAINER_CONCURRENCY: 0 REVISION_TIMEOUT_SECONDS: 300 REVISION_RESPONSE_START_TIMEOUT_SECONDS: 0 REVISION_IDLE_TIMEOUT_SECONDS: 0 SERVING_POD: torchserve-mnist-v2-predictor-00001-deployment-9569d6f8b-8w6pb (v1:metadata.name) SERVING_POD_IP: (v1:status.podIP) SERVING_LOGGING_CONFIG: SERVING_LOGGING_LEVEL: SERVING_REQUEST_LOG_TEMPLATE: {"httpRequest": {"requestMethod": "{{.Request.Method}}", "requestUrl": "{{js .Request.RequestURI}}", "requestSize": "{{.Request.ContentLength}}", "status": {{.Response.Code}}, "responseSize": "{{.Response.Size}}", "userAgent": "{{js .Request.UserAgent}}", "remoteIp": "{{js .Request.RemoteAddr}}", "serverIp": "{{.Revision.PodIP}}", "referer": "{{js .Request.Referer}}", "latency": "{{.Response.Latency}}s", "protocol": "{{.Request.Proto}}"}, "traceId": "{{index .Request.Header "X-B3-Traceid"}}"} SERVING_ENABLE_REQUEST_LOG: false SERVING_REQUEST_METRICS_BACKEND: prometheus TRACING_CONFIG_BACKEND: none TRACING_CONFIG_ZIPKIN_ENDPOINT: TRACING_CONFIG_DEBUG: false TRACING_CONFIG_SAMPLE_RATE: 0.1 USER_PORT: 8080 SYSTEM_NAMESPACE: knative-serving METRICS_DOMAIN: knative.dev/internal/serving SERVING_READINESS_PROBE: {"tcpSocket":{"port":8080,"host":"127.0.0.1"},"successThreshold":1} ENABLE_PROFILING: false SERVING_ENABLE_PROBE_REQUEST_LOG: false METRICS_COLLECTOR_ADDRESS: HOST_IP: (v1:status.hostIP) ENABLE_HTTP2_AUTO_DETECTION: false ROOT_CA: Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8wbm2 (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: kube-api-access-8wbm2: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true kserve-provision-location: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: QoS Class: Burstable Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 2m default-scheduler Successfully assigned default/torchserve-mnist-v2-predictor-00001-deployment-9569d6f8b-8w6pb to minikube Normal Pulling 117s kubelet Pulling image "kserve/storage-initializer:v0.11.0" Normal Pulled 100s kubelet Successfully pulled image "kserve/storage-initializer:v0.11.0" in 17.382s (17.382s including waiting) Normal Created 93s kubelet Created container storage-initializer Normal Started 93s kubelet Started container storage-initializer Normal Pulling 76s kubelet Pulling image "index.docker.io/pytorch/torchserve-kfs-nightly@sha256:5ec33cf3085b56b5eedb6b11c67faa5b7e669019573a26459eb49a2394531be2" Normal Pulled 13s kubelet Successfully pulled image "index.docker.io/pytorch/torchserve-kfs-nightly@sha256:5ec33cf3085b56b5eedb6b11c67faa5b7e669019573a26459eb49a2394531be2" in 1m2.971s (1m2.971s including waiting) Normal Created 11s kubelet Created container kserve-container Normal Started 10s kubelet Started container kserve-container Normal Pulling 10s kubelet Pulling image "gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:65c427aaab3be9cea1afea32cdef26d5855c69403077d2dc3439f75c26a1e83f" Normal Pulled 6s kubelet Successfully pulled image "gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:65c427aaab3be9cea1afea32cdef26d5855c69403077d2dc3439f75c26a1e83f" in 4.128s (4.128s including waiting) Normal Created 5s kubelet Created container queue-proxy Normal Started 5s kubelet Started container queue-proxy Warning Unhealthy 2s (x2 over 4s) kubelet Readiness probe failed: Get "http://10.244.0.17:8012/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Wait for inference service to be ready Wait for ports to be in forwarding Forwarding from 127.0.0.1:8080 -> 8080 Forwarding from [::1]:8080 -> 8080 Make inference request % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0Handling connection for 8080 100 6754 100 62 100 6692 98 10654 --:--:-- --:--:-- --:--:-- 10737 ✘ Test failed: Prediction: {"error":"ValueError:NoneisnotavalidPredictorProtocol"}, expected {"model_name":"mnist","model_version":"1.0","id":"d3b15cad-50a2-4eaf-80ce-8b0a428bd298","parameters":null,"outputs":[{"name":"input-0","shape":[1],"datatype":"INT64","parameters":null,"data":[1]}]}. Delete cluster

KServe 0.11.0 run for default protocol prior to this PR

MNIST KServe V1 test begin Deploying the cluster inferenceservice.serving.kserve.io/torchserve created Waiting for pod to come up... Check status of the pod NAME READY STATUS RESTARTS AGE torchserve-predictor-00001-deployment-c8b8c4595-mxtpf 1/2 Running 0 115s Name: torchserve-predictor-00001-deployment-c8b8c4595-mxtpf Namespace: default Priority: 0 Service Account: default Node: minikube/192.168.49.2 Start Time: Thu, 29 Feb 2024 09:35:17 +0100 Labels: app=torchserve-predictor-00001 component=predictor pod-template-hash=c8b8c4595 service.istio.io/canonical-name=torchserve-predictor service.istio.io/canonical-revision=torchserve-predictor-00001 serviceEnvelope=kserve serving.knative.dev/configuration=torchserve-predictor serving.knative.dev/configurationGeneration=1 serving.knative.dev/configurationUID=36ca148d-50d5-402e-9c78-3becaa325179 serving.knative.dev/revision=torchserve-predictor-00001 serving.knative.dev/revisionUID=a1de2f03-1a4f-4620-a18c-e96e568709b8 serving.knative.dev/service=torchserve-predictor serving.knative.dev/serviceUID=aeb06d26-b27d-42fc-8167-46c1521410ea serving.kserve.io/inferenceservice=torchserve Annotations: autoscaling.knative.dev/class: kpa.autoscaling.knative.dev autoscaling.knative.dev/min-scale: 1 internal.serving.kserve.io/storage-initializer-sourceuri: gs://kfserving-examples/models/torchserve/image_classifier/v1 prometheus.kserve.io/path: /metrics prometheus.kserve.io/port: 8082 serving.knative.dev/creator: system:serviceaccount:kserve:kserve-controller-manager serving.kserve.io/enable-metric-aggregation: false serving.kserve.io/enable-prometheus-scraping: false Status: Running IP: 10.244.0.17 IPs: IP: 10.244.0.17 Controlled By: ReplicaSet/torchserve-predictor-00001-deployment-c8b8c4595 Init Containers: storage-initializer: Container ID: docker://98f5c07ad65f6095f52916559fa118f45c64ffb342637c2caac5b55aa8e718a6 Image: kserve/storage-initializer:v0.11.0 Image ID: docker-pullable://kserve/storage-initializer@sha256:962682077dc30c21113822f50b63400d759ce6c49518d0c9caa638b6f77c7fed Port: Host Port: Args: gs://kfserving-examples/models/torchserve/image_classifier/v1 /mnt/models State: Terminated Reason: Completed Exit Code: 0 Started: Thu, 29 Feb 2024 09:35:43 +0100 Finished: Thu, 29 Feb 2024 09:35:59 +0100 Ready: True Restart Count: 0 Limits: cpu: 1 memory: 1Gi Requests: cpu: 100m memory: 100Mi Environment: Mounts: /mnt/models from kserve-provision-location (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pfjj8 (ro) Containers: kserve-container: Container ID: docker://9f5a3a87a2cb3d393e24806e178f7fbd66cdc74bf99875334336ca8897d6cac7 Image: index.docker.io/pytorch/torchserve-kfs-nightly@sha256:5ec33cf3085b56b5eedb6b11c67faa5b7e669019573a26459eb49a2394531be2 Image ID: docker-pullable://pytorch/torchserve-kfs-nightly@sha256:5ec33cf3085b56b5eedb6b11c67faa5b7e669019573a26459eb49a2394531be2 Port: 8080/TCP Host Port: 0/TCP Args: torchserve --start --model-store=/mnt/models/model-store --ts-config=/mnt/models/config/config.properties State: Running Started: Thu, 29 Feb 2024 09:37:05 +0100 Ready: True Restart Count: 0 Limits: cpu: 1 memory: 1Gi Requests: cpu: 100m memory: 256Mi Environment: TS_SERVICE_ENVELOPE: kserve PORT: 8080 K_REVISION: torchserve-predictor-00001 K_CONFIGURATION: torchserve-predictor K_SERVICE: torchserve-predictor Mounts: /mnt/models from kserve-provision-location (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pfjj8 (ro) queue-proxy: Container ID: docker://eb7c37a231243d0e19973d392aa8a9425429eb10aea1c8a415353768cae214ad Image: gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:65c427aaab3be9cea1afea32cdef26d5855c69403077d2dc3439f75c26a1e83f Image ID: docker-pullable://gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:65c427aaab3be9cea1afea32cdef26d5855c69403077d2dc3439f75c26a1e83f Ports: 8022/TCP, 9090/TCP, 9091/TCP, 8012/TCP, 8112/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP State: Running Started: Thu, 29 Feb 2024 09:37:11 +0100 Ready: False Restart Count: 0 Requests: cpu: 25m Readiness: http-get http://:8012/ delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: SERVING_NAMESPACE: default SERVING_SERVICE: torchserve-predictor SERVING_CONFIGURATION: torchserve-predictor SERVING_REVISION: torchserve-predictor-00001 QUEUE_SERVING_PORT: 8012 QUEUE_SERVING_TLS_PORT: 8112 CONTAINER_CONCURRENCY: 0 REVISION_TIMEOUT_SECONDS: 300 REVISION_RESPONSE_START_TIMEOUT_SECONDS: 0 REVISION_IDLE_TIMEOUT_SECONDS: 0 SERVING_POD: torchserve-predictor-00001-deployment-c8b8c4595-mxtpf (v1:metadata.name) SERVING_POD_IP: (v1:status.podIP) SERVING_LOGGING_CONFIG: SERVING_LOGGING_LEVEL: SERVING_REQUEST_LOG_TEMPLATE: {"httpRequest": {"requestMethod": "{{.Request.Method}}", "requestUrl": "{{js .Request.RequestURI}}", "requestSize": "{{.Request.ContentLength}}", "status": {{.Response.Code}}, "responseSize": "{{.Response.Size}}", "userAgent": "{{js .Request.UserAgent}}", "remoteIp": "{{js .Request.RemoteAddr}}", "serverIp": "{{.Revision.PodIP}}", "referer": "{{js .Request.Referer}}", "latency": "{{.Response.Latency}}s", "protocol": "{{.Request.Proto}}"}, "traceId": "{{index .Request.Header "X-B3-Traceid"}}"} SERVING_ENABLE_REQUEST_LOG: false SERVING_REQUEST_METRICS_BACKEND: prometheus TRACING_CONFIG_BACKEND: none TRACING_CONFIG_ZIPKIN_ENDPOINT: TRACING_CONFIG_DEBUG: false TRACING_CONFIG_SAMPLE_RATE: 0.1 USER_PORT: 8080 SYSTEM_NAMESPACE: knative-serving METRICS_DOMAIN: knative.dev/internal/serving SERVING_READINESS_PROBE: {"tcpSocket":{"port":8080,"host":"127.0.0.1"},"successThreshold":1} ENABLE_PROFILING: false SERVING_ENABLE_PROBE_REQUEST_LOG: false METRICS_COLLECTOR_ADDRESS: HOST_IP: (v1:status.hostIP) ENABLE_HTTP2_AUTO_DETECTION: false ROOT_CA: Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pfjj8 (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: kube-api-access-pfjj8: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true kserve-provision-location: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: QoS Class: Burstable Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 115s default-scheduler Successfully assigned default/torchserve-predictor-00001-deployment-c8b8c4595-mxtpf to minikube Normal Pulling 113s kubelet Pulling image "kserve/storage-initializer:v0.11.0" Normal Pulled 95s kubelet Successfully pulled image "kserve/storage-initializer:v0.11.0" in 17.862s (17.862s including waiting) Normal Created 89s kubelet Created container storage-initializer Normal Started 89s kubelet Started container storage-initializer Normal Pulling 72s kubelet Pulling image "index.docker.io/pytorch/torchserve-kfs-nightly@sha256:5ec33cf3085b56b5eedb6b11c67faa5b7e669019573a26459eb49a2394531be2" Normal Pulled 9s kubelet Successfully pulled image "index.docker.io/pytorch/torchserve-kfs-nightly@sha256:5ec33cf3085b56b5eedb6b11c67faa5b7e669019573a26459eb49a2394531be2" in 1m2.861s (1m2.861s including waiting) Normal Created 7s kubelet Created container kserve-container Normal Started 7s kubelet Started container kserve-container Normal Pulling 7s kubelet Pulling image "gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:65c427aaab3be9cea1afea32cdef26d5855c69403077d2dc3439f75c26a1e83f" Normal Pulled 2s kubelet Successfully pulled image "gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:65c427aaab3be9cea1afea32cdef26d5855c69403077d2dc3439f75c26a1e83f" in 4.483s (4.483s including waiting) Normal Created 1s kubelet Created container queue-proxy Normal Started 1s kubelet Started container queue-proxy

Wait for inference service to be ready Wait for ports to be in forwarding Forwarding from 127.0.0.1:8080 -> 8080 Forwarding from [::1]:8080 -> 8080 Make inference request % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0Handling connection for 8080 100 463 100 62 100 401 405 2621 --:--:-- --:--:-- --:--:-- 3026 ✘ Test failed: Prediction: {"error":"ValueError:NoneisnotavalidPredictorProtocol"}, expected {"predictions":[2]}.

I did two runs to show that having the protocolVersion entry in the InferenceService definition does not change the failure.

While I can update the version of KServe, I think many people creating models to be used by torchserve do not handle the underlying infrastructure it will run on, so fixing the wrapper will have a broader impact.

Feb 29 '24 09:02 sgaist

serve serve copied to clipboard

Improve kserve protocol version handling

Description

Type of change

Feature/Issue validation/testing

Checklist:

serve
serve copied to clipboard