yocto-gl [FR] Enable python clients to serve models and collect predictions from python code

Willingness to contribute

[ ] Yes. I can contribute this feature independently.
[ ] Yes. I would be willing to contribute this feature with guidance from the MLflow community.
[x] No. I cannot contribute this feature at this time.

Proposal Summary

Enable python clients to serve models and collect predictions from python code

Motivation

What is the use case for this feature? This is a very similar use case to what sagemaker does, basically code base A defines this model, and code base B needs to consume that model's predictions. Code base B could just load the model, but then code base B inherits the dependencies of code base A. If code base B needs to get predictions from another model defined in code base C, then code base A and code base C have to have compatible dependencies in order to both be used in code base B.
Why is this use case valuable to support for MLflow users in general? Sometimes you don't need to provision a whole other machine whose whole purpose is to serve the model. Sometimes you have pipelines that just need to consume some predictions for a period of time and then turn off.
Why is this use case valuable to support for your project(s) or organization? Basically for the same reason as the general use case value.
Why is it currently difficult to achieve this use case? (please be as specific as possible about why related MLflow features and components are insufficient) In the master version of this code you can get pretty close but it is very repetitive, and relies on technically protected functionality that may change.

import pandas as pd
import mlflow
from mlflow.models.cli import _get_flavor_backend

backend = _get_flavor_backend('models:/ElasticnetWineModel/1')
backend.serve('models:/ElasticnetWineModel/1', port='9090', host='127.0.0.1', enable_mlserver=False, synchronous=False)
time.sleep(2)
df = pd.DataFrame(
    columns=["alcohol", "chlorides", "citric acid", "density", "fixed acidity", "free sulfur dioxide", "pH", "residual sugar", "sulphates", "total sulfur dioxide", "volatile acidity"],
    data=[[12.8, 0.029, 0.48, 0.98, 6.2, 29, 3.33, 1.2, 0.39, 75, 0.66], [12.8, 0.029, 0.48, 0.98, 6.2, 29, 3.33, 1.2, 0.39, 75, 0.66]]
)
backend.predict('models:/ElasticnetWineModel/1', ...)

Notice that you have to resupply the model URI multiple times. It would also be nice if you could just pass a DataFrame, but backend.predict expects an input file, and delivers an output file.

What component(s), interfaces, languages, and integrations does this feature affect?

Components

[ ] area/artifacts: Artifact stores and artifact logging
[ ] area/build: Build and test infrastructure for MLflow
[ ] area/docs: MLflow documentation pages
[ ] area/examples: Example code
[ ] area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
[x] area/models: MLmodel format, model serialization/deserialization, flavors
[ ] area/projects: MLproject format, project running backends
[ ] area/scoring: MLflow Model server, model deployment tools, Spark UDFs
[ ] area/server-infra: MLflow Tracking server backend
[ ] area/tracking: Tracking Service, tracking client APIs, autologging

Interfaces

[ ] area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
[ ] area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
[ ] area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
[ ] area/windows: Windows support

Languages

[ ] language/r: R APIs and clients
[ ] language/java: Java APIs and clients
[ ] language/new: Proposals for new client languages

Integrations

[ ] integrations/azure: Azure and Azure ML integrations
[ ] integrations/sagemaker: SageMaker integrations
[ ] integrations/databricks: Databricks integrations

Details

This feature would make more basic, pipeline driven applications much easier. Ideally as a User I can supply a known URI to some functionality that serves the model in an isolated environment, and has the capability to submit input, all in the same process.

Apr 05 '22 20:04 efagerberg

@efagerberg Apologies for the delay, and thank you for this feature request. We are currently working on solutions to serve containerized models and expose corresponding inference APIs within a driver Python process.

Aug 12 '22 20:08 dbczumar

Great to hear, thanks.

Aug 12 '22 20:08 efagerberg

yocto-gl yocto-gl copied to clipboard

[FR] Enable python clients to serve models and collect predictions from python code

Willingness to contribute

Proposal Summary

Motivation

What component(s), interfaces, languages, and integrations does this feature affect?

Details

yocto-gl
yocto-gl copied to clipboard