yocto-gl
yocto-gl copied to clipboard
Optimize Docker image for model serving
🛠 DevTools 🛠
Install mlflow from this PR
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/10954/merge
Checkout with GitHub CLI
gh pr checkout 10954
Related Issues/PRs
Resolve #2426, #5927
What changes are proposed in this pull request?
Background
Docker image for model serving was not optimized and its large size has been pain for users (#2426, #5927). For example, few MB of scikit-learn model becomes > 3.7GB image after build-image
command. Also the long building time (> 5 minutes) has been another pain.
After internal design discussion, we decided to introduce three optimizations:
- Remove Java and dependencies if the model flavor doesn't require them.
- Use Python base image instead of Ubuntu if possible.
- Remove unnecessary conda/virtualenv layer if using Python image. This PR implements those changes, as well as comprehensive integration tests with (almost) all flavors.
Implementation Notes
Some cases we cannot apply optimization, e.g. model_uri is not specified. To simplify the logic, I implemented those conditions as "all-or-nothing", namely, apply 1+2+3 all or do nothing. The optimizations apply when all the following conditions are met:
-
model_uri
is specified. - User doesn't enable
--install-java
flag. (new CLI param) - The model flavor is not one of those require Java (e.g.
spark
,mleap
) - Python version can be determined from the model metadata. If any of this isn't met, we fallback to the original Ubuntu image with virtualenv and java. Tho majority of usage should meet these conditions so can benefit from the optimization.
Testing
For better safety, I've added integration tests covers (almost) all flavors in tests/pyfunc/test_docker_flavors.py
. These tests are a bit time consuming (~20 mins) so I mark them to be skipped in the CI.
Impact
The impact depends on the actual size of necessary dependencies and model files, but significant for small models. For example, an image for small scikit-learn model becomes 0.99 GB (vs 3.7GB original; 73% reduction). Build time also decreases to 67 secs (vs 430 secs original; 84% reduction). The margin could be bigger for thin/small models like OpenAI, Langchain, while smaller for large models like Pytorch, Transformers.
For Reviewers
While the number of changed file is a bit large (28), most of them doesn't require deep review, such as test Dockerfiles, small tweak to fixture naming (to avoid conflict). Basically core logic changes reside following three iles:
-
mlflow/models/container/__init__.py
-
mlflow/models/docker_utils.py
-
mlflow/pyfunc/backend.py
Can we reduce the image size further?
Speaking of the scikit-learn image, the majority of the size is used by MLflow itself and its dependencies (0.79 GB out of 0.99GB total), while many modules and dependencies are not required for model serving.
How is this PR tested?
- [x] Existing unit/integration tests
- [x] New unit/integration tests
- [x] Manual tests
Does this PR require documentation update?
- [x] No. You can skip the rest of this section.
- [ ] Yes. I've updated:
- [ ] Examples
- [ ] API references
- [ ] Instructions
I will update deployment doc with the change accordingly (user-facing change shouldn't be too big).
Release Notes
Is this a user-facing change?
- [ ] No. You can skip the rest of this section.
- [x] Yes. Give a description of this change to be included in the release notes for MLflow users.
What component(s), interfaces, languages, and integrations does this PR affect?
Components
- [ ]
area/artifacts
: Artifact stores and artifact logging - [ ]
area/build
: Build and test infrastructure for MLflow - [ ]
area/deployments
: MLflow Deployments client APIs, server, and third-party Deployments integrations - [ ]
area/docs
: MLflow documentation pages - [ ]
area/examples
: Example code - [ ]
area/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registry - [ ]
area/models
: MLmodel format, model serialization/deserialization, flavors - [ ]
area/recipes
: Recipes, Recipe APIs, Recipe configs, Recipe Templates - [ ]
area/projects
: MLproject format, project running backends - [x]
area/scoring
: MLflow Model server, model deployment tools, Spark UDFs - [ ]
area/server-infra
: MLflow Tracking server backend - [ ]
area/tracking
: Tracking Service, tracking client APIs, autologging
Interface
- [ ]
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev server - [x]
area/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Models - [ ]
area/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registry - [ ]
area/windows
: Windows support
Language
- [ ]
language/r
: R APIs and clients - [ ]
language/java
: Java APIs and clients - [ ]
language/new
: Proposals for new client languages
Integrations
- [ ]
integrations/azure
: Azure and Azure ML integrations - [x]
integrations/sagemaker
: SageMaker integrations - [ ]
integrations/databricks
: Databricks integrations
How should the PR be classified in the release notes? Choose one:
- [ ]
rn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section - [ ]
rn/breaking-change
- The PR will be mentioned in the "Breaking Changes" section - [x]
rn/feature
- A new user-facing feature worth mentioning in the release notes - [ ]
rn/bug-fix
- A user-facing bug fix worth mentioning in the release notes - [ ]
rn/documentation
- A user-facing documentation change worth mentioning in the release notes
Documentation preview for c9403cbe29a6fc8e4ac4964a25b745b17d5b67c5 will be available here when this CircleCI job completes successfully.
More info
- Ignore this comment if this PR does not change the documentation.
- It takes a few minutes for the preview to be available.
- The preview is updated when a new commit is pushed to this PR.
- This comment was created by https://github.com/mlflow/mlflow/actions/runs/7744898028.
Overall this looks great! Can we:
- Verify that this builds in a windows environment
- Create a followup ticket to add full usage and customization documentation around the introduced changes (Java being removed from the containers, now creating an opt-in experience for inclusion) into the relevant sub section tutorials in https://www.mlflow.org/docs/latest/deployment/index.html
- Setup something similar to cross version testing that is run weekly on the mlflow-automation repo so that we have recurring CI validation of the container build process (I 100% agree that this should not be part of the PR CI process, but we should have some non-manual testing mechanism)
Thanks for the review, @BenWilson2!
Verify that this builds in a windows environment
Sure, do you know what is the easiest option to test this? remote desktop?
Create a followup ticket to add full usage and customization documentation around the introduced changes (Java being removed from the containers, now creating an opt-in experience for inclusion) into the relevant sub section tutorials in https://www.mlflow.org/docs/latest/deployment/index.html
Yup, but Java is not totally removed, they are still installed for flavors like spark, mleap. The flag will be used for custom pyfunc mode, but will add documentation for it anyway.
Setup something similar to cross version testing that is run weekly on the mlflow-automation repo so that we have recurring CI validation of the container build process (I 100% agree that this should not be part of the PR CI process, but we should have some non-manual testing mechanism
Totally makes sense, will do this as a part of follow-up. Created a JIRA.
Can we see if it's possible to test really quick with https://azure.microsoft.com/en-us/products/virtual-desktop (This doesn't have to be a CI job; it's just safer to do a one-time check to see if there is any odd behavior when trying to build this PR's implementation on Windows - just to be safe :) )
Using virtual Windows machine for testing was quite a bit of effort - permission setting, installing tools, etc. I ended up testing with my personal Windows laptop:p
The basic test_docker.py
all passed.
For flavors, mostly passed but failed with a few
-
spark
: Failed with spark installation indeed, not relevant to this change (also we don't introduce any change for Java flavors). -
tensorflow/keras/transformers
: Basically those depends on Tensorflow. The reason is that the model is logged on Windows so withtensorflow-intel
as requirement, while it's not available in the container based on Ubuntu. Should be unrelated to the change itself.
So overall I think this change shouldn't introduce new surprise for Windows users:)