yocto-gl Optimize Docker image for model serving

🛠 DevTools 🛠

Install mlflow from this PR

pip install git+https://github.com/mlflow/mlflow.git@refs/pull/10954/merge

Checkout with GitHub CLI

gh pr checkout 10954

Related Issues/PRs

Resolve #2426, #5927

What changes are proposed in this pull request?

Background

Docker image for model serving was not optimized and its large size has been pain for users (#2426, #5927). For example, few MB of scikit-learn model becomes > 3.7GB image after build-image command. Also the long building time (> 5 minutes) has been another pain.

After internal design discussion, we decided to introduce three optimizations:

Remove Java and dependencies if the model flavor doesn't require them.
Use Python base image instead of Ubuntu if possible.
Remove unnecessary conda/virtualenv layer if using Python image. This PR implements those changes, as well as comprehensive integration tests with (almost) all flavors.

Implementation Notes

Some cases we cannot apply optimization, e.g. model_uri is not specified. To simplify the logic, I implemented those conditions as "all-or-nothing", namely, apply 1+2+3 all or do nothing. The optimizations apply when all the following conditions are met:

model_uri is specified.
User doesn't enable --install-java flag. (new CLI param)
The model flavor is not one of those require Java (e.g. spark, mleap)
Python version can be determined from the model metadata. If any of this isn't met, we fallback to the original Ubuntu image with virtualenv and java. Tho majority of usage should meet these conditions so can benefit from the optimization.

Testing

For better safety, I've added integration tests covers (almost) all flavors in tests/pyfunc/test_docker_flavors.py. These tests are a bit time consuming (~20 mins) so I mark them to be skipped in the CI.

Impact

The impact depends on the actual size of necessary dependencies and model files, but significant for small models. For example, an image for small scikit-learn model becomes 0.99 GB (vs 3.7GB original; 73% reduction). Build time also decreases to 67 secs (vs 430 secs original; 84% reduction). The margin could be bigger for thin/small models like OpenAI, Langchain, while smaller for large models like Pytorch, Transformers.

For Reviewers

While the number of changed file is a bit large (28), most of them doesn't require deep review, such as test Dockerfiles, small tweak to fixture naming (to avoid conflict). Basically core logic changes reside following three iles:

mlflow/models/container/__init__.py
mlflow/models/docker_utils.py
mlflow/pyfunc/backend.py

Can we reduce the image size further?

Speaking of the scikit-learn image, the majority of the size is used by MLflow itself and its dependencies (0.79 GB out of 0.99GB total), while many modules and dependencies are not required for model serving.

How is this PR tested?

[x] Existing unit/integration tests
[x] New unit/integration tests
[x] Manual tests

Does this PR require documentation update?

[x] No. You can skip the rest of this section.
[ ] Yes. I've updated:
- [ ] Examples
- [ ] API references
- [ ] Instructions

I will update deployment doc with the change accordingly (user-facing change shouldn't be too big).

Release Notes

Is this a user-facing change?

[ ] No. You can skip the rest of this section.
[x] Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

[ ] area/artifacts: Artifact stores and artifact logging
[ ] area/build: Build and test infrastructure for MLflow
[ ] area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
[ ] area/docs: MLflow documentation pages
[ ] area/examples: Example code
[ ] area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
[ ] area/models: MLmodel format, model serialization/deserialization, flavors
[ ] area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
[ ] area/projects: MLproject format, project running backends
[x] area/scoring: MLflow Model server, model deployment tools, Spark UDFs
[ ] area/server-infra: MLflow Tracking server backend
[ ] area/tracking: Tracking Service, tracking client APIs, autologging

Interface

[ ] area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
[x] area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
[ ] area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
[ ] area/windows: Windows support

Language

[ ] language/r: R APIs and clients
[ ] language/java: Java APIs and clients
[ ] language/new: Proposals for new client languages

Integrations

[ ] integrations/azure: Azure and Azure ML integrations
[x] integrations/sagemaker: SageMaker integrations
[ ] integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

[ ] rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
[ ] rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
[x] rn/feature - A new user-facing feature worth mentioning in the release notes
[ ] rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
[ ] rn/documentation - A user-facing documentation change worth mentioning in the release notes

Jan 30 '24 13:01 B-Step62

Documentation preview for c9403cbe29a6fc8e4ac4964a25b745b17d5b67c5 will be available here when this CircleCI job completes successfully.

More info

Ignore this comment if this PR does not change the documentation.
It takes a few minutes for the preview to be available.
The preview is updated when a new commit is pushed to this PR.
This comment was created by https://github.com/mlflow/mlflow/actions/runs/7744898028.

Jan 30 '24 13:01 github-actions[bot]

Overall this looks great! Can we:

Verify that this builds in a windows environment
Create a followup ticket to add full usage and customization documentation around the introduced changes (Java being removed from the containers, now creating an opt-in experience for inclusion) into the relevant sub section tutorials in https://www.mlflow.org/docs/latest/deployment/index.html
Setup something similar to cross version testing that is run weekly on the mlflow-automation repo so that we have recurring CI validation of the container build process (I 100% agree that this should not be part of the PR CI process, but we should have some non-manual testing mechanism)

Jan 31 '24 18:01 BenWilson2

Thanks for the review, @BenWilson2!

Verify that this builds in a windows environment

Sure, do you know what is the easiest option to test this? remote desktop?

Create a followup ticket to add full usage and customization documentation around the introduced changes (Java being removed from the containers, now creating an opt-in experience for inclusion) into the relevant sub section tutorials in https://www.mlflow.org/docs/latest/deployment/index.html

Yup, but Java is not totally removed, they are still installed for flavors like spark, mleap. The flag will be used for custom pyfunc mode, but will add documentation for it anyway.

Setup something similar to cross version testing that is run weekly on the mlflow-automation repo so that we have recurring CI validation of the container build process (I 100% agree that this should not be part of the PR CI process, but we should have some non-manual testing mechanism

Totally makes sense, will do this as a part of follow-up. Created a JIRA.

Jan 31 '24 22:01 B-Step62

Can we see if it's possible to test really quick with https://azure.microsoft.com/en-us/products/virtual-desktop (This doesn't have to be a CI job; it's just safer to do a one-time check to see if there is any odd behavior when trying to build this PR's implementation on Windows - just to be safe :) )

Feb 01 '24 00:02 BenWilson2

Using virtual Windows machine for testing was quite a bit of effort - permission setting, installing tools, etc. I ended up testing with my personal Windows laptop:p

The basic test_docker.py all passed. capture`

For flavors, mostly passed but failed with a few

spark: Failed with spark installation indeed, not relevant to this change (also we don't introduce any change for Java flavors).
tensorflow/keras/transformers: Basically those depends on Tensorflow. The reason is that the model is logged on Windows so with tensorflow-intel as requirement, while it's not available in the container based on Ubuntu. Should be unrelated to the change itself.

So overall I think this change shouldn't introduce new surprise for Windows users:)

Feb 01 '24 10:02 B-Step62

yocto-gl yocto-gl copied to clipboard

Optimize Docker image for model serving

Install mlflow from this PR

Checkout with GitHub CLI

Related Issues/PRs

What changes are proposed in this pull request?

Background

Implementation Notes

Testing

Impact

For Reviewers

Can we reduce the image size further?

How is this PR tested?

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

yocto-gl
yocto-gl copied to clipboard