yocto-gl
yocto-gl copied to clipboard
[BUG] MLFlow needs to read .dockerignore and .gitignore or have its own .ignore file
Willingness to contribute
- [ ] Yes. I can contribute a fix for this bug independently.
- [x] Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
- [ ] No. I cannot contribute a bug fix at this time.
System information
- Have I written custom code (as opposed to using a stock example script provided in MLflow): No.
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Mac OS X
- MLflow installed from (source or binary): pip
-
MLflow version (run
mlflow --version
): 1.12.1 - Python version: 3.8
- npm version, if running the dev UI: N/A
-
Exact command to reproduce:
mlflow run .
Describe the problem
MLFlow run
command copies the entire work directory without taking .dockerignore
and .gitignore
into account.
This opens it up to copying heavy directories like venv
(virtualenv), secrets, environment variables (.e.g stored in .env
), etc.
Why the Docker env ignores .dockerignore
is an existing issue #2016.
This can be mitigated by adding an ignore=func
parameter to shutil.copytree
used here. This would need to implement either .dockerignore
or .gitignore
patterns, or implement its own .mlflowignore
file. This would also make MLFlow run faster because it won't need to copy heavy files around (twice with the .dockerignore
bug).
Code to reproduce issue
Use any of the example projects. Add any file into the folder and add it to .dockerignore
. Build the image and note the file being present in the image.
Other info / logs
N/A
What component(s), interfaces, languages, and integrations does this bug affect?
Components
- [ ]
area/artifacts
: Artifact stores and artifact logging - [x]
area/build
: Build and test infrastructure for MLflow - [ ]
area/docs
: MLflow documentation pages - [x]
area/examples
: Example code - [ ]
area/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registry - [ ]
area/models
: MLmodel format, model serialization/deserialization, flavors - [ ]
area/projects
: MLproject format, project running backends - [ ]
area/scoring
: Local serving, model deployment tools, spark UDFs - [ ]
area/server-infra
: MLflow server, JavaScript dev server - [ ]
area/tracking
: Tracking Service, tracking client APIs, autologging
Interface
- [ ]
area/uiux
: Front-end, user experience, JavaScript, plotting - [ ]
area/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Models - [ ]
area/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registry - [ ]
area/windows
: Windows support
Language
- [ ]
language/r
: R APIs and clients - [ ]
language/java
: Java APIs and clients - [ ]
language/new
: Proposals for new client languages
Integrations
- [ ]
integrations/azure
: Azure and Azure ML integrations - [ ]
integrations/sagemaker
: SageMaker integrations - [ ]
integrations/databricks
: Databricks integrations
@levkk Did your commit fix this issue? If so, how is it used?
Hey @zimonitrome , I have stopped using MLFlow since then and haven't had a chance to work on this further. The commit I proposed uses Python's shutil
ignore
parameter which is very similar to how Docker and git implement this functionality in .dockerignore
and .gitignore
respectively. Sadly, that commit was never merged.
Thanks for the follow-up @levkk. Sad that it never got merged :(
May I ask if you migrated to something you enjoy more than MLFlow?
Yup, we just started using Docker and Dockerfiles directly and launch the containers using our existing CI/CD. The model registry MLFlow provided was neat though, so we kept using that just for metadata tracking.
This would be very useful, +1
another +1 for this. It would be very useful.
another +1, would be very useful.