mlflow-snowflake
mlflow-snowflake copied to clipboard
Snowflake MLflow Plugins
Currently provides an experimental Snowflake Mlflow Deployment Plugin. This enables Mlflow users to deploy external trained Mlflow packaged models to Snowflake easily.
This plugin implements the Python API and CLI for MLflow deployment plugins.
Usage
Installation
Please find latest release version here to download latest wheel
.
pip install <local_path_to_wheel>
could install the package with name snowflake-mlflow
for you.
E2E Examples
Please take a look of the example notebooks here.
Session connection
Two ways of connection are supported to establish a Snowflake session for model deployment.
Python API
from snowflake.mlflow import create_session
from mlflow.deployments import get_deploy_client
connection_parameters = dict()
create_session(connection_parameters)
target_uri = 'snowflake'
deployment_client = get_deploy_client(target_uri)
SnowSQL Configuration file
SnowSQL Configuration file is a familiar concept among existing SnowSQL CLI users and a neccessary way to establish connection to Snowflake if you intend to use MLflow CLI for model deployment.
For the Snowflake deployment plugin, the target_uri
needs to have the snowflake
scheme.
Connection parameters can be specified by adding ?connection={CONNECTION_NAME}
.
The CONNECTION_NAME
references the connection specified in the SnowSQL configuration file e.g. snowflake:/?connection=connections.ml
.
Specification for target_uri
If you are using the MLflow python API and already have created a Snowflake session, then target_uri
is snowflake
.
If you are using the MLflow CLI, you need to specify the connection name based on your SnowSQL configuration file. In this case, your target_uri
would be snowflake:/?connection={connection_name}
. The library would take care of creating the Snowflake session for you.
Supported APIs
Following APIs are supported by both Python and CLI.
Python API | CLI |
---|---|
create_deployment(name, model_uri, flavor, config) |
mlflow deployments create |
delete_deployment(name) |
mlflow deployments delete |
get_deployment(name) |
mlflow deployments get |
list_deployments |
mlflow deployments list |
predict(deployment_name, df) |
mlflow deployments predict |
-
create_deployment
Args:
name: Unique name to use for the deployment.
model_uri : URI of the model to deploy.
flavor (optional): Model flavor to deploy. If unspecified, will infer
based on model metadata. Defaults to None.
config (optional): Snowflake-specific configuration for the deployment.
Defaults to None.
Detailed configuration options:
max_batch_size (int): Max batch size for a single vectorized UDF invocation.
The size is not guaranteed by Snowpark.
persist_udf_file (bool): Whether to keep the UDF file generated.
test_data_X (pd.DataFrame): 2d dataframe used as input test data.
test_data_y (pd.Series): 1d series used as expected prediction results.
During testing, model predictions are compared with the expected predictions given in `test_data_y`.
For comparing regression results, 1e-7 precision is used.
use_latest_package_version (bool): Whether to use latest package versions available in Snowlfake conda channel.
Defaults to True. Set this flag to True will greatly increase the chance of successful deployment
of the model to the Snowflake warehouse.
stage_location (str, optional): Stage location to store the UDF and dependencies(format: `@my_named_stage`).
It can be any stage other than temporary stages and external stages. If not specified,
UDF deployment is temporary and tied to the session. Default to be none.
Detailed configuration options for create_deployment
could also be retrieved by mlflow deployments help -t snowflake
-
predict
Args:
deployment_name: Name of the deployment.
df: Either pandas DataFrame or Snowpark DataFrame.
Returns:
Result as a pandas DataFrame or Snowpark DataFrame.
Supported Model types
The following list is last updated on 02/13/2023. If your intended model type is not supported below, please feel free to reach out by creating an issue.
- keras
- pytorch
- sklearn
- tensorflow
- onnx
- xgboost
- lightgbm
- spaCy
- statsmodels
- pmdarima
- prophet
Deployment Name Convention In SQL
To use deployed model in Snowflake SQL context with name my_model_x
, you could invoke with MLFLOW$
prefix:
SELECT MLFLOW$MY_MODEL_X(col1, col2, ..., colN)
Development Setup
- Clone the repo locally.
- Install needed dev dependencies by
pip install -r dev_requirements.txt
- Recommend an fresh virtual environment with python 3.8.
- Intall package in local editable model for development by
pip install -e .
- Run unit tests by
pytest tests/
Contributing
Please refer to CONTRIBUTING.md.