seldon-core icon indicating copy to clipboard operation
seldon-core copied to clipboard

Can't support two tensorflow models in inference graph with seldon protocol

Open stevexiao2012 opened this issue 3 years ago • 4 comments

Hi

In my inference graph, I have two different tensorflow models to handle the same image and the results of them will be combined as the final outputs. Because using seldon protocol, one tfserving container is setup by seldon. But it only works for the first tensorflow model and not for the second one.

My inference graph looks like below:

              ->preprocessor1->crackdetect(tf model)->postprocessor1->
            /                                                          \
image                                                                   -> combiner
            \                                                          /
              ->preprocessor2->gpsdetect(tf model)->postprocessor2  -> 

By below command, we know there is only one tfserving container: kubectl logs seldon-9f805e2ca14e68df70d17baed4c0215e-85cf8549ff-94txx error: a container name must be specified for pod seldon-9f805e2ca14e68df70d17baed4c0215e-85cf8549ff-94txx, choose one of: [preprocessor1 preprocessor2 postprocessor1 postprocessor2 combiner crackdetect gpsdetect tfserving seldon-container-engine] or one of the init containers: [tfserving-model-initializer]

The logs of tfserving show only the first model crackdetect is loaded: kubectl logs seldon-9f805e2ca14e68df70d17baed4c0215e-85cf8549ff-94txx tfserving 2022-06-11 15:09:52.410132: I tensorflow_serving/model_servers/server.cc:89] Building single TensorFlow model file config: model_name: crackdetect model_base_path: /mnt/models 2022-06-11 15:09:52.410752: I tensorflow_serving/model_servers/server_core.cc:465] Adding/updating models. 2022-06-11 15:09:52.410769: I tensorflow_serving/model_servers/server_core.cc:591] (Re-)adding model: crackdetect 2022-06-11 15:09:52.424030: I tensorflow_serving/core/basic_manager.cc:740] Successfully reserved resources to load servable {name: crackdetect version: 1} 2022-06-11 15:09:52.425859: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: crackdetect version: 1} 2022-06-11 15:09:52.425882: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: crackdetect version: 1} 2022-06-11 15:09:52.426067: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:38] Reading SavedModel from: /mnt/models/1 2022-06-11 15:09:52.610643: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:90] Reading meta graph with tags { serve } 2022-06-11 15:09:52.610683: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:132] Reading SavedModel debug info (if present) from: /mnt/models/1 2022-06-11 15:09:52.611037: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-06-11 15:09:52.808190: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:211] Restoring SavedModel bundle. 2022-06-11 15:09:52.984589: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:195] Running initialization op on SavedModel bundle at path: /mnt/models/1 2022-06-11 15:09:53.159152: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:283] SavedModel load for tags { serve }; Status: success: OK. Took 728168 microseconds. 2022-06-11 15:09:53.176492: I tensorflow_serving/servables/tensorflow/saved_model_warmup_util.cc:59] No warmup data file found at /mnt/models/1/assets.extra/tf_serving_warmup_requests 2022-06-11 15:09:53.177395: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: crackdetect version: 1} 2022-06-11 15:09:53.189281: I tensorflow_serving/model_servers/server_core.cc:486] Finished adding/updating models 2022-06-11 15:09:53.189373: I tensorflow_serving/model_servers/server.cc:133] Using InsecureServerCredentials 2022-06-11 15:09:53.189397: I tensorflow_serving/model_servers/server.cc:383] Profiler service is enabled 2022-06-11 15:09:53.206571: I tensorflow_serving/model_servers/server.cc:409] Running gRPC ModelServer at 0.0.0.0:2000 ... [warn] getaddrinfo: address family for nodename not supported 2022-06-11 15:09:53.208236: I tensorflow_serving/model_servers/server.cc:430] Exporting HTTP/REST API at:localhost:2001 ... [evhttp_server.cc : 245] NET_LOG: Entering the event loop ...

When we send request to the service, the container of the second tensorflow model gpsdetect shows below error log: 2022-06-11 15:09:58,997 - seldon_core.gunicorn_utils:load:103 - INFO: Tracing not active 2022-06-11 15:20:37,797 - root:predict:164 - WARNING: Error from server: <Response [404]> content: { "error": "Servable not found for request: Latest(gpsdetect)"

And we get below error response: {"status":{"code":-1,"info":"too many indices for array: array is 0-dimensional, but 1 were indexed","reason":"MICROSERVICE_INTERNAL_ERROR","status":1}}

My deployment file:


apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: combiner
spec:
  name: combiner
  protocol: seldon
  annotations:
    seldon.io/rest-timeout: "15000"
  predictors:
    - componentSpecs:
        - spec:
            containers:
              - image: seldonio/metadata-generic-node:preprocessor
                name: preprocessor1
              - image: seldonio/metadata-generic-node:preprocessor
                name: preprocessor2
              - image: seldonio/metadata-generic-node:postprocessor
                name: postprocessor1
              - image: seldonio/metadata-generic-node:postprocessor
                name: postprocessor2
              - image: seldonio/metadata-generic-node:combiner
                name: combiner
      graph:
        name: combiner
        type: COMBINER
        children:
        - name: postprocessor1
          type: OUTPUT_TRANSFORMER
          children:
          - name: preprocessor1
            type: TRANSFORMER
            children:
            - name: crackdetect
              implementation: TENSORFLOW_SERVER
              modelUri: s3://xxxx/crackdetect
              envSecretRefName: seldon-rclone-secret
              type: MODEL
        - name: postprocessor2
          type: OUTPUT_TRANSFORMER
          children:
          - name: preprocessor2
            type: TRANSFORMER
            children:
            - name: gpsdetect
              implementation: TENSORFLOW_SERVER
              modelUri: s3://xxxx/gpsdetect
              envSecretRefName: seldon-rclone-secret
              type: MODEL
      name: test
      replicas: 1

stevexiao2012 avatar Jun 11 '22 15:06 stevexiao2012

OK will need to investigate. The code that creates it is here

Are you seeing just 1 tfserving container?

ukclivecox avatar Jun 24 '22 06:06 ukclivecox

From memory the tensorlfow proxy only supports a limited input to my knowledge, so for combiners you would need to create your own prepackaged server or extend the tensorflow serving proxy logic

axsaucedo avatar Jun 24 '22 07:06 axsaucedo

OK will need to investigate. The code that creates it is here

Are you seeing just 1 tfserving container?

Yes, only 1 tfserving container and it only works for the first TF model

stevexiao2012 avatar Jul 14 '22 12:07 stevexiao2012

OK will need to investigate. The code that creates it is here Are you seeing just 1 tfserving container?

Yes, only 1 tfserving container and it only works for the first TF model

I think the best solution is to let the tfserving container can load mutiple TF models defined in inference graph and make them servable. No need to create a TF serving container for each TF model.

stevexiao2012 avatar Jul 14 '22 12:07 stevexiao2012

We are supporting v2 (open inference protocol) going forward - also try v2

ukclivecox avatar Dec 19 '22 11:12 ukclivecox