MLServer Add inference graphs support

Currently we only expose the predict method. However, some orchestrations frameworks like SC support the use of other "inference steps", like routing or aggregation. It would be good to explore how these could be added, probably as a custom extension.

There was an early discussion on how this could be done here: https://docs.google.com/presentation/d/1uHg7qfZxivygo5E-ExcChVqNKqoliYJrJc_j3I6VrkU/edit?usp=sharing

Oct 14 '20 09:10 adriangonz

@adriangonz Hi Is this included in ml server 1.0 release?

Feb 15 '22 17:02 divyadilip91

Hey @divyadilip91,

This one is not included in the 1.0 release. However, if you're using Seldon Core, it's already possible to chain MLServer nodes within an inference graph between them.

Feb 16 '22 15:02 adriangonz

@adriangonz Thankyou for the link to the document.

Feb 24 '22 09:02 divyadilip91

@adriangonz is it possible to chain MLServer nodes if we set seldon.io/no-engine:"true" to ensure ambassador works?

May 11 '22 11:05 kurianbenoy-sentient

Hey @kurianbenoy-sentient ,

Unfortunately, the engine / executor is the one chaining and orchestrating the requests across multiple models. So if you disable it, you'd lose that ability. As far as I know, Ambassador should work with the executor enabled, with the exception of custom endpoints.

May 11 '22 15:05 adriangonz

Thanks for clarifying @adriangonz :)

May 12 '22 15:05 kurianbenoy-sentient

Hi @adriangonz , What is the url to be used if we connect two MLServer docker images using seldon core. Below is the example of the graph I have created. Two MLServer docker containers enmltrans and op are connected

graph:
      endpoint:
        type: REST
      name: enmltrans
      type: MODEL
      children: 
      - endpoint:
          type: REST
        name: op
        type: MODEL
        children: []

When I try to use the url http://ip/seldon/default/enmltrans-mlserver/v2/models/enmltranslation/infer", the first component shows a 200 in the logs, but second component returns 404 saying enmltranslation model not found.

Below is the seldon deployment yaml used

apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: enmltrans-mlserver
  labels:
    model_name: enmltrans
    api_type: microservice
    microservice_type: ai
spec:
  protocol: kfserving
  annotations:
    project_name: enmltrans
    deployment_version: v0.1.0
    seldon.io/rest-timeout: '60000'
    seldon.io/rest-connection-timeout: '60000'
    seldon.io/grpc-read-timeout: '60000'
    
  name: enmltrans
  predictors:
  - componentSpecs:
    - spec:
        containers:
        - name: enmltrans
          image: enml_mlserver_test:0.1.0
          imagePullPolicy: Always
          resources:
            requests:
              memory: 2Gi
              cpu: 2
          ports:
            - containerPort: 8080
              name: http
              protocol: TCP
        - name: op
          image: outputtransformer_mlserver:0.1.0
          imagePullPolicy: Always
          resources:
            requests:
              memory: 2Gi
              cpu: 2
          ports:
            - containerPort: 8083
              name: http
              protocol: TCP
          

    graph:
      endpoint:
        type: REST
      name: enmltrans
      type: MODEL
      children: 
      - endpoint:
          type: REST
        name: op
        type: MODEL
        children: []

    name: pod

Please help to resolve this issue Thanks

Jun 21 '22 10:06 divyadilip91

Hey @divyadilip91 ,

Just for context, there're a few nuances when it comes to model name. Seldon Core will expect that, within MLServer, the model is named like the container (i.e. enmltrans or op). Since Seldon Core injects some config parameters (like the model name), this is generally the case.

However, it's also possible that MLServer has some embedded configuration that forces a different name. This is the case, for example, when you build a custom image with mlserver build which injects the config from a model-settings.json file.

To avoid the above, you can try to rebuild your custom enml_mlserver_test:0.1.0 image, but removing the name field from its local model-settings.json file. This will allow the resulting image to accept whatever name Seldon Core is trying to set.

Jun 22 '22 08:06 adriangonz

Hey @adriangonz,

Just piggy-backing on this issue, but I'm having trouble with gRPC and inference graphs. Specifically, two custom runtimes (1 model node, 1 postprocessing node). My seldon deployment look like the following:

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: my-model
spec:
  annotations:
    seldon.io/grpc-max-message-size: '100000000'
    seldon.io/grpc-timeout: '100000'
    namespace: seldon
  name: my-model
  protocol: v2
  predictors:
  - componentSpecs:
    - spec:
        containers:
        - name: my-model
          image: ...
          env:
          - name: SELDON_LOG_LEVEL
            value: DEBUG
        - name: postprocessor
          image: ...
          env:
          - name: SELDON_LOG_LEVEL
            value: DEBUG
    name: my-model
    replicas: 1
    svcOrchSpec:
      env:
      - name: SELDON_LOG_LEVEL
        value: DEBUG
    graph:
      children:
        - name: postprocessor
          type: MODEL
          children: []
      endpoint:
        type: GRPC
      name: my-model
      type: MODEL
      serviceAccountName: seldon-sa

The model-settings.json leaves the name field blank but this gets overwritten by Seldon. I then make a gRPC request as follows:

import grpc
from mlserver.grpc.dataplane_pb2_grpc import GRPCInferenceServiceStub
from mlserver.grpc.dataplane_pb2 import ModelInferRequest
import mlserver.grpc.converters as converters
import mlserver.types as types

host = "my_host:80"
deployment_name = "my-model"

with grpc.insecure_channel(host, options=ch_options) as channel:
    stub = GRPCInferenceServiceStub(channel)
    inference_request = types.InferenceRequest(
        inputs=[
            types.RequestInput(
                name="X",
                shape=...,
                datatype="BYTES",
                data=[...],
            )
        ],
    )
    request = converters.ModelInferRequestConverter.from_types(
        inference_request,
        model_name=deployment_name,
        use_raw=False,
    )
    metadata = [("seldon", deployment_name), ("namespace", "seldon")]
    response = stub.ModelInfer(request=request, metadata=metadata)

When I run the above, I get the following traceback:

Traceback (most recent call last):
  file "/opt/conda/lib/python3.8/site-packages/mlserver/grpc/utils.py", line 44, in _inner
    return await f(self, request, context)
  file "/opt/conda/lib/python3.8/site-packages/mlserver/grpc/servicers.py", line 77, in modelinfer
    result = await self._data_plane.infer(
  file "/opt/conda/lib/python3.8/site-packages/mlserver/handlers/dataplane.py", line 95, in infer
    model = await self._model_registry.get_model(name, version)
  file "/opt/conda/lib/python3.8/site-packages/mlserver/registry.py", line 307, in get_model
    model_registry = self._get_model_registry(name, version)
  file "/opt/conda/lib/python3.8/site-packages/mlserver/registry.py", line 325, in _get_model_registry
    raise modelnotfound(name, version)
mlserver.errors.ModelNotFound: Model my-model with version  not found

I should specify that I can see from the logs that this is occurring within the postprocessor container after the request is successfully routed through the my-model container. I suspect this is because I've specified my-model as the name in the request creation, but the model_name field is required so it's not clear to me how I can use Seldon-core inference graphs with mlserver using gRPC.

Would really appreciate any pointers here - thanks :slightly_smiling_face:

Jul 03 '23 12:07 edfincham

Hey @edfincham ,

Since this is a Seldon Core issue, could you raise it in either the seldon-core repo or the Seldon Community Slack?

This issue is actually not relevant anymore, as it has been already solved by Seldon Core v2.

Jul 03 '23 12:07 adriangonz