MLServer icon indicating copy to clipboard operation
MLServer copied to clipboard

Simplify interface to write custom runtimes

Open adriangonz opened this issue 3 years ago • 0 comments
trafficstars

When writing custom runtimes, MLServer requires the user to know a bit about the V2 request and response structure, as well as codecs. This can add some friction when it comes to writing custom code.

When working in Tempo, one of the main aims was to simplify how “custom models” were implemented. Therefore, it would be interesting to revisit some of these ideas to help simplify MLServer runtimes.

With this goal in mind, it would be interesting to let the user write runtimes like the one below:

class MyCustomRuntime(MLModel):
    # Use signatures to declare the expected request and response content types
    async def predict(self, payload: np.array) -> np.array:
        # Do something with the payload
        pred = payload.sum(keepdims=1)

        # Return result as is
        return pred

Under the hood, MLServer could look at the Python type hints to determine the right content types that should be used, and add them to the model metadata. This would allow MLServer to, on-the-fly, find the right set of codecs and encode / decode the request and response.

Multiple Inputs / Outputs

The V2 protocol lets the user send and return multiple input and output “heads”. To account for this, the new simplified interface could use the names of the function arguments to match them with the incoming request:

class MyCustomRuntime(MLModel):
    # Input `foo` would get passed as kwarg `foo`, and input `bar` as kwarg `bar`
    async def predict(self, foo: np.array, bar: np.array) -> np.array:
            ...

When encoding back the response though, the signature itself doesn’t show any info about the expected names. Therefore, MLServer will need to:

  • Look at the outputs field of the model metadata, and match the returned values based on their order.
  • In the absence of the above, make up some default names (e.g. output-0, output-1, etc.).

Note that this doesn’t include function signatures with a single “multi-input/-output” return type, like a Pandas Dataframe. In these cases, MLServer will look at the column names to infer the right output names.

Variable Content Types & Advanced Use Cases

In some cases, the approach suggested above may show some limitations though. For example, if we think of runtimes which accept a variable set of content types, like the MLflow or SKLearn runtimes (which accept either dataframes or numpy arrays), or other advanced use cases.

However, these use cases should still be supported by the current approach which would remain valid. That is, when using the “low-level” InferenceRequest and InferenceResponse types (which correspond to the respective V2 payloads), the encoding / decoding will be the user’s responsibility (who will need to call something like self.decode).

adriangonz avatar Aug 17 '22 15:08 adriangonz