seldon-core
seldon-core copied to clipboard
Can't support two tensorflow models in inference graph with seldon protocol
Hi
In my inference graph, I have two different tensorflow models to handle the same image and the results of them will be combined as the final outputs. Because using seldon protocol, one tfserving container is setup by seldon. But it only works for the first tensorflow model and not for the second one.
My inference graph looks like below:
->preprocessor1->crackdetect(tf model)->postprocessor1->
/ \
image -> combiner
\ /
->preprocessor2->gpsdetect(tf model)->postprocessor2 ->
By below command, we know there is only one tfserving container: kubectl logs seldon-9f805e2ca14e68df70d17baed4c0215e-85cf8549ff-94txx error: a container name must be specified for pod seldon-9f805e2ca14e68df70d17baed4c0215e-85cf8549ff-94txx, choose one of: [preprocessor1 preprocessor2 postprocessor1 postprocessor2 combiner crackdetect gpsdetect tfserving seldon-container-engine] or one of the init containers: [tfserving-model-initializer]
The logs of tfserving show only the first model crackdetect is loaded: kubectl logs seldon-9f805e2ca14e68df70d17baed4c0215e-85cf8549ff-94txx tfserving 2022-06-11 15:09:52.410132: I tensorflow_serving/model_servers/server.cc:89] Building single TensorFlow model file config: model_name: crackdetect model_base_path: /mnt/models 2022-06-11 15:09:52.410752: I tensorflow_serving/model_servers/server_core.cc:465] Adding/updating models. 2022-06-11 15:09:52.410769: I tensorflow_serving/model_servers/server_core.cc:591] (Re-)adding model: crackdetect 2022-06-11 15:09:52.424030: I tensorflow_serving/core/basic_manager.cc:740] Successfully reserved resources to load servable {name: crackdetect version: 1} 2022-06-11 15:09:52.425859: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: crackdetect version: 1} 2022-06-11 15:09:52.425882: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: crackdetect version: 1} 2022-06-11 15:09:52.426067: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:38] Reading SavedModel from: /mnt/models/1 2022-06-11 15:09:52.610643: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:90] Reading meta graph with tags { serve } 2022-06-11 15:09:52.610683: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:132] Reading SavedModel debug info (if present) from: /mnt/models/1 2022-06-11 15:09:52.611037: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-06-11 15:09:52.808190: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:211] Restoring SavedModel bundle. 2022-06-11 15:09:52.984589: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:195] Running initialization op on SavedModel bundle at path: /mnt/models/1 2022-06-11 15:09:53.159152: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:283] SavedModel load for tags { serve }; Status: success: OK. Took 728168 microseconds. 2022-06-11 15:09:53.176492: I tensorflow_serving/servables/tensorflow/saved_model_warmup_util.cc:59] No warmup data file found at /mnt/models/1/assets.extra/tf_serving_warmup_requests 2022-06-11 15:09:53.177395: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: crackdetect version: 1} 2022-06-11 15:09:53.189281: I tensorflow_serving/model_servers/server_core.cc:486] Finished adding/updating models 2022-06-11 15:09:53.189373: I tensorflow_serving/model_servers/server.cc:133] Using InsecureServerCredentials 2022-06-11 15:09:53.189397: I tensorflow_serving/model_servers/server.cc:383] Profiler service is enabled 2022-06-11 15:09:53.206571: I tensorflow_serving/model_servers/server.cc:409] Running gRPC ModelServer at 0.0.0.0:2000 ... [warn] getaddrinfo: address family for nodename not supported 2022-06-11 15:09:53.208236: I tensorflow_serving/model_servers/server.cc:430] Exporting HTTP/REST API at:localhost:2001 ... [evhttp_server.cc : 245] NET_LOG: Entering the event loop ...
When we send request to the service, the container of the second tensorflow model gpsdetect shows below error log: 2022-06-11 15:09:58,997 - seldon_core.gunicorn_utils:load:103 - INFO: Tracing not active 2022-06-11 15:20:37,797 - root:predict:164 - WARNING: Error from server: <Response [404]> content: { "error": "Servable not found for request: Latest(gpsdetect)"
And we get below error response: {"status":{"code":-1,"info":"too many indices for array: array is 0-dimensional, but 1 were indexed","reason":"MICROSERVICE_INTERNAL_ERROR","status":1}}
My deployment file:
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: combiner
spec:
name: combiner
protocol: seldon
annotations:
seldon.io/rest-timeout: "15000"
predictors:
- componentSpecs:
- spec:
containers:
- image: seldonio/metadata-generic-node:preprocessor
name: preprocessor1
- image: seldonio/metadata-generic-node:preprocessor
name: preprocessor2
- image: seldonio/metadata-generic-node:postprocessor
name: postprocessor1
- image: seldonio/metadata-generic-node:postprocessor
name: postprocessor2
- image: seldonio/metadata-generic-node:combiner
name: combiner
graph:
name: combiner
type: COMBINER
children:
- name: postprocessor1
type: OUTPUT_TRANSFORMER
children:
- name: preprocessor1
type: TRANSFORMER
children:
- name: crackdetect
implementation: TENSORFLOW_SERVER
modelUri: s3://xxxx/crackdetect
envSecretRefName: seldon-rclone-secret
type: MODEL
- name: postprocessor2
type: OUTPUT_TRANSFORMER
children:
- name: preprocessor2
type: TRANSFORMER
children:
- name: gpsdetect
implementation: TENSORFLOW_SERVER
modelUri: s3://xxxx/gpsdetect
envSecretRefName: seldon-rclone-secret
type: MODEL
name: test
replicas: 1
OK will need to investigate. The code that creates it is here
Are you seeing just 1 tfserving container?
From memory the tensorlfow proxy only supports a limited input to my knowledge, so for combiners you would need to create your own prepackaged server or extend the tensorflow serving proxy logic
OK will need to investigate. The code that creates it is here
Are you seeing just 1 tfserving container?
Yes, only 1 tfserving container and it only works for the first TF model
OK will need to investigate. The code that creates it is here Are you seeing just 1 tfserving container?
Yes, only 1 tfserving container and it only works for the first TF model
I think the best solution is to let the tfserving container can load mutiple TF models defined in inference graph and make them servable. No need to create a TF serving container for each TF model.
We are supporting v2 (open inference protocol) going forward - also try v2