Add inference graphs support
Currently we only expose the predict method. However, some orchestrations frameworks like SC support the use of other "inference steps", like routing or aggregation. It would be good to explore how these could be added, probably as a custom extension.
There was an early discussion on how this could be done here: https://docs.google.com/presentation/d/1uHg7qfZxivygo5E-ExcChVqNKqoliYJrJc_j3I6VrkU/edit?usp=sharing
@adriangonz Hi Is this included in ml server 1.0 release?
Hey @divyadilip91,
This one is not included in the 1.0 release. However, if you're using Seldon Core, it's already possible to chain MLServer nodes within an inference graph between them.
@adriangonz Thankyou for the link to the document.
@adriangonz is it possible to chain MLServer nodes if we set seldon.io/no-engine:"true" to ensure ambassador works?
Hey @kurianbenoy-sentient ,
Unfortunately, the engine / executor is the one chaining and orchestrating the requests across multiple models. So if you disable it, you'd lose that ability. As far as I know, Ambassador should work with the executor enabled, with the exception of custom endpoints.
Thanks for clarifying @adriangonz :)
Hi @adriangonz , What is the url to be used if we connect two MLServer docker images using seldon core. Below is the example of the graph I have created. Two MLServer docker containers enmltrans and op are connected
graph:
endpoint:
type: REST
name: enmltrans
type: MODEL
children:
- endpoint:
type: REST
name: op
type: MODEL
children: []
When I try to use the url http://ip/seldon/default/enmltrans-mlserver/v2/models/enmltranslation/infer", the first component shows a 200 in the logs, but second component returns 404 saying enmltranslation model not found.
Below is the seldon deployment yaml used
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: enmltrans-mlserver
labels:
model_name: enmltrans
api_type: microservice
microservice_type: ai
spec:
protocol: kfserving
annotations:
project_name: enmltrans
deployment_version: v0.1.0
seldon.io/rest-timeout: '60000'
seldon.io/rest-connection-timeout: '60000'
seldon.io/grpc-read-timeout: '60000'
name: enmltrans
predictors:
- componentSpecs:
- spec:
containers:
- name: enmltrans
image: enml_mlserver_test:0.1.0
imagePullPolicy: Always
resources:
requests:
memory: 2Gi
cpu: 2
ports:
- containerPort: 8080
name: http
protocol: TCP
- name: op
image: outputtransformer_mlserver:0.1.0
imagePullPolicy: Always
resources:
requests:
memory: 2Gi
cpu: 2
ports:
- containerPort: 8083
name: http
protocol: TCP
graph:
endpoint:
type: REST
name: enmltrans
type: MODEL
children:
- endpoint:
type: REST
name: op
type: MODEL
children: []
name: pod
Please help to resolve this issue Thanks
Hey @divyadilip91 ,
Just for context, there're a few nuances when it comes to model name. Seldon Core will expect that, within MLServer, the model is named like the container (i.e. enmltrans or op). Since Seldon Core injects some config parameters (like the model name), this is generally the case.
However, it's also possible that MLServer has some embedded configuration that forces a different name. This is the case, for example, when you build a custom image with mlserver build which injects the config from a model-settings.json file.
To avoid the above, you can try to rebuild your custom enml_mlserver_test:0.1.0 image, but removing the name field from its local model-settings.json file. This will allow the resulting image to accept whatever name Seldon Core is trying to set.
Hey @adriangonz,
Just piggy-backing on this issue, but I'm having trouble with gRPC and inference graphs. Specifically, two custom runtimes (1 model node, 1 postprocessing node). My seldon deployment look like the following:
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
name: my-model
spec:
annotations:
seldon.io/grpc-max-message-size: '100000000'
seldon.io/grpc-timeout: '100000'
namespace: seldon
name: my-model
protocol: v2
predictors:
- componentSpecs:
- spec:
containers:
- name: my-model
image: ...
env:
- name: SELDON_LOG_LEVEL
value: DEBUG
- name: postprocessor
image: ...
env:
- name: SELDON_LOG_LEVEL
value: DEBUG
name: my-model
replicas: 1
svcOrchSpec:
env:
- name: SELDON_LOG_LEVEL
value: DEBUG
graph:
children:
- name: postprocessor
type: MODEL
children: []
endpoint:
type: GRPC
name: my-model
type: MODEL
serviceAccountName: seldon-sa
The model-settings.json leaves the name field blank but this gets overwritten by Seldon. I then make a gRPC request as follows:
import grpc
from mlserver.grpc.dataplane_pb2_grpc import GRPCInferenceServiceStub
from mlserver.grpc.dataplane_pb2 import ModelInferRequest
import mlserver.grpc.converters as converters
import mlserver.types as types
host = "my_host:80"
deployment_name = "my-model"
with grpc.insecure_channel(host, options=ch_options) as channel:
stub = GRPCInferenceServiceStub(channel)
inference_request = types.InferenceRequest(
inputs=[
types.RequestInput(
name="X",
shape=...,
datatype="BYTES",
data=[...],
)
],
)
request = converters.ModelInferRequestConverter.from_types(
inference_request,
model_name=deployment_name,
use_raw=False,
)
metadata = [("seldon", deployment_name), ("namespace", "seldon")]
response = stub.ModelInfer(request=request, metadata=metadata)
When I run the above, I get the following traceback:
Traceback (most recent call last):
file "/opt/conda/lib/python3.8/site-packages/mlserver/grpc/utils.py", line 44, in _inner
return await f(self, request, context)
file "/opt/conda/lib/python3.8/site-packages/mlserver/grpc/servicers.py", line 77, in modelinfer
result = await self._data_plane.infer(
file "/opt/conda/lib/python3.8/site-packages/mlserver/handlers/dataplane.py", line 95, in infer
model = await self._model_registry.get_model(name, version)
file "/opt/conda/lib/python3.8/site-packages/mlserver/registry.py", line 307, in get_model
model_registry = self._get_model_registry(name, version)
file "/opt/conda/lib/python3.8/site-packages/mlserver/registry.py", line 325, in _get_model_registry
raise modelnotfound(name, version)
mlserver.errors.ModelNotFound: Model my-model with version not found
I should specify that I can see from the logs that this is occurring within the postprocessor container after the request is successfully routed through the my-model container. I suspect this is because I've specified my-model as the name in the request creation, but the model_name field is required so it's not clear to me how I can use Seldon-core inference graphs with mlserver using gRPC.
Would really appreciate any pointers here - thanks :slightly_smiling_face:
Hey @edfincham ,
Since this is a Seldon Core issue, could you raise it in either the seldon-core repo or the Seldon Community Slack?
This issue is actually not relevant anymore, as it has been already solved by Seldon Core v2.