seldon-core icon indicating copy to clipboard operation
seldon-core copied to clipboard

Metadata.yaml does not work with tensorflow prepackaged server / seldon protocol

Open jacobmalmberg opened this issue 3 years ago • 3 comments

Describe the bug

Placing a metadata.yaml file with metadata information about the model in the model s3 bucket does not work when using the prepackaged tensorflow server and the seldon protocol. When exectuing curl service:/api/v1.0/metadata | jq . this metadata (see "to reproduce" below for exact yaml) should be presented but instead I get

{
  "name": "default",
  "models": {
    "mnist-model": {
      "name": "seldonio/tfserving-proxy",
      "versions": [
        "1.12.0-dev"
      ],
      "inputs": [],
      "outputs": []
    }
  },
  "graphinputs": [],
  "graphoutputs": []
}

Metadata is not imported from metadata.yaml but are seemingly taken from the image name of the model container (seldonio/tfserving-proxy:1.12.0-dev). According to https://docs.seldon.io/projects/seldon-core/en/latest/referenceapis/metadata.html#prepackaged-model-servers, the metadata presented should be from metadata.yaml.

To reproduce

I run the mnist example from https://docs.seldon.io/projects/seldon-core/en/latest/servers/tensorflow.html with an extra metadata.yaml file in the bucket.

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: tfserving
spec:
  name: mnist
  predictors:
  - graph:
      children: []
      implementation: TENSORFLOW_SERVER
      modelUri: s3://seldon-models/tfserving/mnist-model-with-metadata
      storageInitializerImage: r-clone-with-org-certs:latest
      name: mnist-model
      parameters:
        - name: signature_name
          type: STRING
          value: predict_images
        - name: model_name
          type: STRING
          value: mnist-model
    name: default
    replicas: 1

Metadata.yaml

name: mnist-model
versions: [1]
platform: tensorflow

Expected behaviour

curl mnist-model-default:8000/api/v1.0/metadata | jq . should yield

{
  "name": "default",
  "models": {
    "mnist-model": {
      "name": "mnist-model",
      "platform": "tensorflow",
      "versions": [
        "1"
      ],
      "inputs": [],
      "outputs": []
    }
  },
  "graphinputs": [],
  "graphoutputs": []
}

Environment

Seldon 1.12.0-dev

  • Cloud Provider: On prem
  • Kubernetes Cluster Version v1.21.1
  • Deployed Seldon System Images:

value: docker.io/seldonio/engine:1.12.0-dev value: seldonio/seldon-core-executor:1.12.0-dev image: seldonio/seldon-core-operator:1.12.0-dev

Model Details

kubectl logs tfserving-mnist-default-0-mnist-model-54cb7954f9-cst2r -c mnist-model                                                                                                      
starting microservice
2021-12-09 08:27:06.634624: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-12-09 08:27:06.634664: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-12-09 08:27:09,288 - seldon_core.microservice:main:211 - INFO:  Starting microservice.py:main
2021-12-09 08:27:09,288 - seldon_core.microservice:main:212 - INFO:  Seldon Core version: 1.12.0-dev
2021-12-09 08:27:09,291 - seldon_core.microservice:main:367 - INFO:  Parse JAEGER_EXTRA_TAGS []
2021-12-09 08:27:09,291 - seldon_core.microservice:load_annotations:163 - INFO:  Found annotation cni.projectcalico.org/containerID:92ef0cf2ff16b665f0f2057a1f901396bdc27c6072898eb422e0067d0ae93d48
2021-12-09 08:27:09,291 - seldon_core.microservice:load_annotations:163 - INFO:  Found annotation cni.projectcalico.org/podIP:10.42.8.71/32
2021-12-09 08:27:09,291 - seldon_core.microservice:load_annotations:163 - INFO:  Found annotation cni.projectcalico.org/podIPs:10.42.8.71/32
2021-12-09 08:27:09,291 - seldon_core.microservice:load_annotations:163 - INFO:  Found annotation kubernetes.io/config.seen:2021-12-09T08:27:02.472518348Z
2021-12-09 08:27:09,291 - seldon_core.microservice:load_annotations:163 - INFO:  Found annotation kubernetes.io/config.source:api
2021-12-09 08:27:09,291 - seldon_core.microservice:load_annotations:163 - INFO:  Found annotation prometheus.io/path:/prometheus
2021-12-09 08:27:09,291 - seldon_core.microservice:load_annotations:163 - INFO:  Found annotation prometheus.io/scrape:true
2021-12-09 08:27:09,291 - seldon_core.microservice:main:370 - INFO:  Annotations: {'cni.projectcalico.org/containerID': '92ef0cf2ff16b665f0f2057a1f901396bdc27c6072898eb422e0067d0ae93d48', 'cni.projectcalico.org/podIP': '10.42.8.71/32', 'cni.projectcalico.org/podIPs': '10.42.8.71/32', 'kubernetes.io/config.seen': '2021-12-09T08:27:02.472518348Z', 'kubernetes.io/config.source': 'api', 'prometheus.io/path': '/prometheus', 'prometheus.io/scrape': 'true'}
2021-12-09 08:27:09,291 - seldon_core.microservice:main:374 - INFO:  Importing TfServingProxy
2021-12-09 08:27:09,322 - seldon_core.microservice:main:463 - INFO:  REST gunicorn microservice running on port 9000
2021-12-09 08:27:09,323 - seldon_core.microservice:main:557 - INFO:  REST metrics microservice running on port 6000
2021-12-09 08:27:09,323 - seldon_core.microservice:main:567 - INFO:  Starting servers
2021-12-09 08:27:09,332 - seldon_core.microservice:grpc_prediction_server:520 - INFO:  GRPC Server Binding to '%s' 0.0.0.0:9500 with 1 processes
2021-12-09 08:27:09,335 - seldon_core.microservice:rest_prediction_server:448 - INFO:  Gunicorn Config:  {'bind': '0.0.0.0:9000', 'accesslog': None, 'loglevel': 'info', 'timeout': 5000, 'threads': 1, 'workers': 1, 'max_requests': 0, 'max_requests_jitter': 0, 'post_worker_init': <function post_worker_init at 0x7f4f755a3320>, 'worker_exit': functools.partial(<function worker_exit at 0x7f4f75530f80>, seldon_metrics=<seldon_core.metrics.SeldonMetrics object at 0x7f4f752be590>), 'keepalive': 2}
2021-12-09 08:27:09,339 - seldon_core.wrapper:_set_flask_app_configs:224 - INFO:  App Config:  <Config {'ENV': 'production', 'DEBUG': False, 'TESTING': False, 'PROPAGATE_EXCEPTIONS': None, 'PRESERVE_CONTEXT_ON_EXCEPTION': None, 'SECRET_KEY': None, 'PERMANENT_SESSION_LIFETIME': datetime.timedelta(days=31), 'USE_X_SENDFILE': False, 'SERVER_NAME': None, 'APPLICATION_ROOT': '/', 'SESSION_COOKIE_NAME': 'session', 'SESSION_COOKIE_DOMAIN': None, 'SESSION_COOKIE_PATH': None, 'SESSION_COOKIE_HTTPONLY': True, 'SESSION_COOKIE_SECURE': False, 'SESSION_COOKIE_SAMESITE': None, 'SESSION_REFRESH_EACH_REQUEST': True, 'MAX_CONTENT_LENGTH': None, 'SEND_FILE_MAX_AGE_DEFAULT': datetime.timedelta(seconds=43200), 'TRAP_BAD_REQUEST_ERRORS': None, 'TRAP_HTTP_EXCEPTIONS': False, 'EXPLAIN_TEMPLATE_LOADING': False, 'PREFERRED_URL_SCHEME': 'http', 'JSON_AS_ASCII': True, 'JSON_SORT_KEYS': True, 'JSONIFY_PRETTYPRINT_REGULAR': False, 'JSONIFY_MIMETYPE': 'application/json', 'TEMPLATES_AUTO_RELOAD': None, 'MAX_COOKIE_SIZE': 4093}>
2021-12-09 08:27:09,339 - seldon_core.microservice:_run_grpc_server:475 - INFO:  Starting new GRPC server with 1.
2021-12-09 08:27:09,342 - seldon_core.wrapper:_set_flask_app_configs:224 - INFO:  App Config:  <Config {'ENV': 'production', 'DEBUG': False, 'TESTING': False, 'PROPAGATE_EXCEPTIONS': None, 'PRESERVE_CONTEXT_ON_EXCEPTION': None, 'SECRET_KEY': None, 'PERMANENT_SESSION_LIFETIME': datetime.timedelta(days=31), 'USE_X_SENDFILE': False, 'SERVER_NAME': None, 'APPLICATION_ROOT': '/', 'SESSION_COOKIE_NAME': 'session', 'SESSION_COOKIE_DOMAIN': None, 'SESSION_COOKIE_PATH': None, 'SESSION_COOKIE_HTTPONLY': True, 'SESSION_COOKIE_SECURE': False, 'SESSION_COOKIE_SAMESITE': None, 'SESSION_REFRESH_EACH_REQUEST': True, 'MAX_CONTENT_LENGTH': None, 'SEND_FILE_MAX_AGE_DEFAULT': datetime.timedelta(seconds=43200), 'TRAP_BAD_REQUEST_ERRORS': None, 'TRAP_HTTP_EXCEPTIONS': False, 'EXPLAIN_TEMPLATE_LOADING': False, 'PREFERRED_URL_SCHEME': 'http', 'JSON_AS_ASCII': True, 'JSON_SORT_KEYS': True, 'JSONIFY_PRETTYPRINT_REGULAR': False, 'JSONIFY_MIMETYPE': 'application/json', 'TEMPLATES_AUTO_RELOAD': None, 'MAX_COOKIE_SIZE': 4093}>
[2021-12-09 08:27:09 +0000] [108] [INFO] Starting gunicorn 20.1.0
[2021-12-09 08:27:09 +0000] [108] [INFO] Listening at: http://0.0.0.0:6000 (108)
[2021-12-09 08:27:09 +0000] [108] [INFO] Using worker: sync
[2021-12-09 08:27:09 +0000] [127] [INFO] Booting worker with pid: 127
[2021-12-09 08:27:09 +0000] [7] [INFO] Starting gunicorn 20.1.0
[2021-12-09 08:27:09 +0000] [7] [INFO] Listening at: http://0.0.0.0:9000 (7)
[2021-12-09 08:27:09 +0000] [7] [INFO] Using worker: sync
[2021-12-09 08:27:09 +0000] [134] [INFO] Booting worker with pid: 134
2021-12-09 08:27:09,371 - seldon_core.gunicorn_utils:load:103 - INFO:  Tracing not active

jacobmalmberg avatar Dec 09 '21 08:12 jacobmalmberg

The prepacked Tensorflow server with the Seldon protocol uses a proxy.

I think we would need to extend this to ensure the metadata is handled by the proxy. It may not presently have access to the downloaded artifacts.

ukclivecox avatar Jan 09 '22 10:01 ukclivecox

Access to artifacts may be one thing, but we should also extend TfServingProxy class to implement init_metadata like we have here for example https://github.com/SeldonIO/seldon-core/blob/master/servers/sklearnserver/sklearnserver/SKLearnServer.py#L53-L66

RafalSkolasinski avatar Jan 10 '22 17:01 RafalSkolasinski

SKLearnServer server do get model_uri as one of the parameters, the TfServingProxy does not.

RafalSkolasinski avatar Jan 10 '22 17:01 RafalSkolasinski

Closing

ukclivecox avatar Dec 19 '22 11:12 ukclivecox