yocto-gl
yocto-gl copied to clipboard
[FR] Enable python clients to serve models and collect predictions from python code
Willingness to contribute
- [ ] Yes. I can contribute this feature independently.
- [ ] Yes. I would be willing to contribute this feature with guidance from the MLflow community.
- [x] No. I cannot contribute this feature at this time.
Proposal Summary
Enable python clients to serve models and collect predictions from python code
Motivation
- What is the use case for this feature? This is a very similar use case to what sagemaker does, basically code base A defines this model, and code base B needs to consume that model's predictions. Code base B could just load the model, but then code base B inherits the dependencies of code base A. If code base B needs to get predictions from another model defined in code base C, then code base A and code base C have to have compatible dependencies in order to both be used in code base B.
- Why is this use case valuable to support for MLflow users in general? Sometimes you don't need to provision a whole other machine whose whole purpose is to serve the model. Sometimes you have pipelines that just need to consume some predictions for a period of time and then turn off.
- Why is this use case valuable to support for your project(s) or organization? Basically for the same reason as the general use case value.
- Why is it currently difficult to achieve this use case? (please be as specific as possible about why related MLflow features and components are insufficient) In the master version of this code you can get pretty close but it is very repetitive, and relies on technically protected functionality that may change.
import pandas as pd
import mlflow
from mlflow.models.cli import _get_flavor_backend
backend = _get_flavor_backend('models:/ElasticnetWineModel/1')
backend.serve('models:/ElasticnetWineModel/1', port='9090', host='127.0.0.1', enable_mlserver=False, synchronous=False)
time.sleep(2)
df = pd.DataFrame(
columns=["alcohol", "chlorides", "citric acid", "density", "fixed acidity", "free sulfur dioxide", "pH", "residual sugar", "sulphates", "total sulfur dioxide", "volatile acidity"],
data=[[12.8, 0.029, 0.48, 0.98, 6.2, 29, 3.33, 1.2, 0.39, 75, 0.66], [12.8, 0.029, 0.48, 0.98, 6.2, 29, 3.33, 1.2, 0.39, 75, 0.66]]
)
backend.predict('models:/ElasticnetWineModel/1', ...)
Notice that you have to resupply the model URI multiple times. It would also be nice if you could just pass a DataFrame, but backend.predict
expects an input file, and delivers an output file.
What component(s), interfaces, languages, and integrations does this feature affect?
Components
- [ ]
area/artifacts
: Artifact stores and artifact logging - [ ]
area/build
: Build and test infrastructure for MLflow - [ ]
area/docs
: MLflow documentation pages - [ ]
area/examples
: Example code - [ ]
area/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registry - [x]
area/models
: MLmodel format, model serialization/deserialization, flavors - [ ]
area/projects
: MLproject format, project running backends - [ ]
area/scoring
: MLflow Model server, model deployment tools, Spark UDFs - [ ]
area/server-infra
: MLflow Tracking server backend - [ ]
area/tracking
: Tracking Service, tracking client APIs, autologging
Interfaces
- [ ]
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev server - [ ]
area/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Models - [ ]
area/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registry - [ ]
area/windows
: Windows support
Languages
- [ ]
language/r
: R APIs and clients - [ ]
language/java
: Java APIs and clients - [ ]
language/new
: Proposals for new client languages
Integrations
- [ ]
integrations/azure
: Azure and Azure ML integrations - [ ]
integrations/sagemaker
: SageMaker integrations - [ ]
integrations/databricks
: Databricks integrations
Details
This feature would make more basic, pipeline driven applications much easier. Ideally as a User I can supply a known URI to some functionality that serves the model in an isolated environment, and has the capability to submit input, all in the same process.
@efagerberg Apologies for the delay, and thank you for this feature request. We are currently working on solutions to serve containerized models and expose corresponding inference APIs within a driver Python process.
Great to hear, thanks.